CN108763548A - Collect method, apparatus, equipment and the computer readable storage medium of training data - Google Patents

Collect method, apparatus, equipment and the computer readable storage medium of training data Download PDF

Info

Publication number
CN108763548A
CN108763548A CN201810553778.3A CN201810553778A CN108763548A CN 108763548 A CN108763548 A CN 108763548A CN 201810553778 A CN201810553778 A CN 201810553778A CN 108763548 A CN108763548 A CN 108763548A
Authority
CN
China
Prior art keywords
message
user
reply
replied
training data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810553778.3A
Other languages
Chinese (zh)
Inventor
王矩
张晶晶
孙珂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810553778.3A priority Critical patent/CN108763548A/en
Publication of CN108763548A publication Critical patent/CN108763548A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

According to an example embodiment of the present disclosure, a kind of method, apparatus, electronic equipment and computer readable storage medium for collecting training data is provided.Method includes obtaining first message from the user and obtaining the second message from the user for being directed to the first reply, wherein the first reply is generated based on first message, and first message and second message are natural language form.Method further includes in response to determining that second message is semantically clarifying first message, first message being determined as the training data for training chat conversations.In accordance with an embodiment of the present disclosure, for the situation for having adjusted robot reply due to user's clarification in chat conversations, the chat messages of user can be automatically added to training sample, it is achieved in the training data for directly collecting high quality from natural language dialogue, reduce trained cost and improves training effectiveness.

Description

Collect method, apparatus, equipment and the computer readable storage medium of training data
Technical field
Embodiment of the disclosure relates generally to artificial intelligence field, and more particularly relates to collect training data Method, apparatus, electronic equipment and computer readable storage medium.
Background technology
In recent years, the theory of " dialogue is platform (Conversation as a Platform) " is increasingly rooted in the hearts of the people, more Begin to use conversational man-machine interaction mode come more networking products and application.Chat robots refer to that can pass through text Word, voice or picture etc. realize the computer program or software of human-computer interaction, are appreciated that the content that user sends out, and certainly It is dynamic to respond.Chat robots can replace true man to engage in the dialogue to a certain extent, can be integrated into conversational system It is middle to be used as automatic on-line assistant, for such as scenes such as intelligence chat, customer service, information query.
In order to keep chat robots more intelligent and be chatted in a manner of human conversation, it usually needs use training number According to being trained.Training data is the data marked for training machine learning model, can be used for improving model Performance.In general, in the training process of chat robots, dialogue sample mark, model training and compliance test result are as opposite Independent function is individually performed.For example, it is desired to could be into after carrying out the dialogue sample mark of certain magnitude for business scenario Row model training also needs to carry out test and verification effect with chat robots dialogue again after training model, needs parallel The performance of chat robots is assessed by manually recorded.
Invention content
According to an example embodiment of the present disclosure, a kind of method, apparatus, electronic equipment for collecting training data is provided And computer readable storage medium.
In the first aspect of the disclosure, a kind of method for collecting training data is provided.This method includes:It obtains First message from the user;The second message from the user for being directed to the first reply is obtained, wherein first replys based on first Message and be generated, and first message and second message are natural language form;And in response to determining that second message exists First message is semantically clarified, first message is determined as the training data for training chat conversations.
In the second aspect of the disclosure, a kind of device for collecting training data is provided.The device includes:First Message obtains module, is configured as obtaining first message from the user;Second message obtains module, is configured as being come from The second message of user replied for first, wherein first is replied and is generated based on first message, and first message and the Two message are natural language form;And training data determining module, it is configured to respond to determine second message in semanteme First message is determined as the training data for training chat conversations by upper clarification first message.
In the third aspect of the disclosure, a kind of electronic equipment is provided comprising one or more processors and deposit Storage device, storage device is for storing one or more programs.One or more programs, which are worked as, to be executed by one or more processors, So that electronic equipment realizes method or process according to an embodiment of the present disclosure.
In the fourth aspect of the disclosure, a kind of computer-readable medium is provided, computer program is stored thereon with, it should Method or process according to an embodiment of the present disclosure are realized when program is executed by processor.
It should be appreciated that the content described in this part of the disclosure is not intended to limit the key of embodiment of the disclosure Or important feature, without in limiting the scope of the present disclosure.The other feature of the disclosure will become to hold by description below It is readily understood.
Description of the drawings
It refers to the following detailed description in conjunction with the accompanying drawings, the above and other feature, advantage and aspect of each embodiment of the disclosure It will be apparent.In the accompanying drawings, same or analogous reference numeral indicates same or analogous element, wherein:
Fig. 1 shows that embodiment of the disclosure can realize the schematic diagram of example context wherein;
Fig. 2 shows the graphical users of the example dialogue between user according to an embodiment of the present disclosure and chat robots The diagram at interface (GUI);
Fig. 3 shows the flow chart according to an embodiment of the present disclosure for collecting the method for training data;
Fig. 4 shows the flow chart according to an embodiment of the present disclosure for obtaining the method for training sample;
Fig. 5 shows the schematic diagram of the markup information according to an embodiment of the present disclosure for user message;
Fig. 6 shows the flow chart that the process of first message is clarified using second message according to the disclosure;
Fig. 7 shows the block diagram according to an embodiment of the present disclosure for collecting the device of training data;And
Fig. 8 shows the block diagram of the electronic equipment for the multiple embodiments that can implement the disclosure.
Specific implementation mode
Embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although shown in the drawings of the certain of the disclosure Embodiment, it should be understood that, the disclosure can be realized by various forms, and should not be construed as being limited to this In the embodiment that illustrates, it is in order to more thorough and be fully understood by the disclosure to provide these embodiments on the contrary.It should be understood that It is that being given for example only property of the accompanying drawings and embodiments effect of the disclosure is not intended to limit the protection domain of the disclosure.
In the description of embodiment of the disclosure, term " comprising " and its similar term should be understood as that opening includes, " include but not limited to ".Term "based" should be understood as " being based at least partially on ".Term " one embodiment " should be managed Solution is " at least one embodiment ".Hereafter it is also possible that other specific and implicit definition.
Traditional conversational system needs to input user message when adding and marking sample during training, and It also needs to input the work for once existing and obviously repeating again in verification the verifying results, causes development efficiency low.For conventional method A kind of improvement be once to complete the verification of dialogue sample, dialogue sample mark and the dialogue work of recruitment evaluation three in the gui. However, this improved method need to wake up trainer come real time modifying markup information (need using shortcut key or@trainers into Row wakes up), and fixed format (such as it is SYS_OTHER to be intended to negative format) can only be used, in addition, it is desirable to use key Disk and mouse toggle, this causes, and input process is complicated and input efficiency is relatively low.In addition, the model that trainer's method is supported Be with limit, the intention of complex dialogs logic, word slot can only be supported to correct, and fix the required response unit of question and answer scene then without Method is corrected, with limitation.Meanwhile trainer's method is mainly developer and data assistant director using object, and be unsuitable for The terminal user of various abnormal conditions is encountered in real scene.Therefore, the cost of conventional method collection training data is higher simultaneously And efficiency is very low.
Embodiment of the disclosure proposes a kind of scheme for quickly collecting training data.According to the implementation of the disclosure Example has adjusted the situation of robot reply in chat conversations since user clarifies, and the chat messages of user can be by It is automatically added to training sample, wakes up trainer without user and using the instruction of predetermined format.Therefore, the disclosure Embodiment is realized can directly collect the training data of high quality from natural language dialogue, can not only reduce trained cost and Training effectiveness can be improved.Method according to an embodiment of the present disclosure, user can realize in the gui the mark of dialogue and The wrong identification result of chat robots can be corrected by talking with by experiencing preferably dialogue effect, user.
Therefore, the disclosure, which is embodiment, will transfer to terminal user to the correction procedure of chat robots, and Error correction annotation process is can be realized as during the normal natural language dialogue of user, so as to quickly be collected into high quality Labeled data helps to optimize chat robots.In addition, method according to an embodiment of the present disclosure, fixes under question and answer scene and asks The data answered in library also can be optimised based on the correction of user.It is described in detail the one of the disclosure below with reference to attached drawing 1-8 A little example embodiments.
Fig. 1 shows that embodiment of the disclosure can realize the schematic diagram of example context 100 wherein.In example context In 100, user 110 carries out chat conversations 115 with chat robots 120 (being known as " chat engine ").Optionally, user 110 can directly can engage in the dialogue with chat robots 120 in the local of chat robots 120, i.e. user 110.Alternatively, User 110 can also use its local device (laptop computer, desktop computer, smart phone, tablet computer etc.) Chat conversations are carried out by network and chat robots 120.It should be appreciated that chat robots 120 can be deployed to it is local In electronic equipment, it can also be deployed in cloud or by distributed deployment.
With reference to figure 1, environment 100 further includes dimensioning machine people 130, and dimensioning machine people 130 can in non real-time or in real time From chat robots 120 obtain dialogue sample 125, and the certain dialogue samples identified to using embodiment of the disclosure into Row automatic marking.In non real-time scene, dimensioning machine people 130 can identify from chat conversations history and can be marked Dialogue sample.In real-time scene, for each user message, dimensioning machine people 130 can analyse whether in real time can be with It is added into dialogue sample.After completing to mark, labeled data 135 is sent to training dataset 140 by dimensioning machine people 130 In, training dataset 140 can be used for training 145 chat robots 120, to improve chat robots 120 accuracy and It is intelligent.In addition, other than obtaining training data from user session, can also be obtained from big data 150 artificial mark and/ Or the training data of automatic marking is for training chat robots 120.
The locally or remotely position that dimensioning machine people 130 may be at chat robots 120 (such as is deployed in the cloud Or distributed deployment).It should be appreciated that although what is detached in Fig. 1 shows chat robots 120 and dimensioning machine people 130, so And dimensioning machine people 130 can also integrate with chat robots 120, form chat conversations system.In addition, chatting machine Device people 120 can integrate the function of automatic marking dialogue sample, and therefore, chat robots 120 can also in real time or non-reality When labeled data.
Fig. 2 shows the example GUI of the example dialogue between user according to an embodiment of the present disclosure and chat robots 200 diagram, for example, chat conversations shown in GUI 200 can be the user 110 above with reference to described in Fig. 1 and chat Chat conversations 115 between robot 120.
As shown in Fig. 2, after user 110 opens the chat window with chat robots 120, chat robots 120 can With send out first greeting message 201 (such as " you are good, I is chat robots, now you can talk with me, have a try ?.").Other than chat conversations 201-224, GUI 200 further includes the window 250 and hair for inputting user 110 and inputting message Send the button 260 of message.It should be appreciated that user 110 can also by other means (such as voice) and chat robots 120 into Row dialogue.
With reference to the 211-212 in figure 2, user 110 can initiate the first round with chat robots 120 in chat window and chat Its dialogue.For example, user 110 sends out the query messages 211 about weather, (such as " how is Beijing weather tomorrow?"), chat Robot 120 can correctly identify the intention of the inquiry weather of user, and provide reply 212 (such as " Beijing will be fine tomorrow, gas Warm 18-33 degree.").
With continued reference to Fig. 2, user 110 can also initiate the second wheel dialogue 220 with chat robots 120.For example, user (such as " I will see 110 message 221 for sending out about amusement《Blame sincere not faze》"), chat robots 120 can not correctly be known at this time The intention of other user, in the reply 222 for being to provide None- identified intention, (such as " sorry, I does not understand your meaning.").It connects Get off, user 110 sends the new information 223 (such as " I will watch movie ") for clarifying previous message 221, due to message 223 Semanteme clarification to message 211 so that chat robots 120 can accurately identify being intended that for user and watch movie, and then chat Robot 120 be capable of providing at this time accurately reply 224 (such as " playing film ---《Blame sincere not faze》.").
In the example GUI 200 of Fig. 2, in the first round talks with 211-212, since chat robots 120 can be correct Parsing message 211 and the intention for accurately identifying user 110, thus this dialogue sample is chatted typically without training is noted for again Its robot 120.On the contrary, in the second 220 (i.e. 221-224) of wheel dialogue, chat robots 120 can not accurately identify user's The intention of message 221.On the contrary, clarification of the user by message 223 to message 221, chat robots 120 can accurately identify The intention of user 110, thus this dialogue sample is more suitable for being noted for training chat robots 120.Use 220 training of dialogue The chat robots gone out can be straight according to message 221 when encountering message as message 221 or similar situation again next time The accurate intention for determining user is connect, sends out further clarification message 223 again without user.Therefore, embodiment of the disclosure It can realize the training data for directly collecting high quality from natural language dialogue, without the additional artificial mark of user, improve The collecting efficiency of training data.In addition, the training data collected in accordance with an embodiment of the present disclosure is usually directed to chat robots The true abnormal scene (badcase) that can not be accurately identified, thus such training data can more improve the standard of chat robots True property and intelligent.
Fig. 3 shows the flow chart according to an embodiment of the present disclosure for collecting the method 300 of training data.It should manage Solution, method 300 can be executed by the dimensioning machine people 130 above with reference to described in Fig. 1.In addition, having in chat robots 120 In the case of standby marking Function, method 300 can also be executed by chat robots 120.
In frame 302, first message from the user (i.e. the user message of first round chat) is obtained.For example, with reference to 1 He of figure Fig. 2, dimensioning machine people 130 obtain the message 221 from user 110 in real time or in non real-time.Chat robots 120 can be with For message 221, corresponding reply 222 is generated, fails to accurately identify user's wherein replying 222 instruction chat robots 120 It is intended to, such as cannot identifies the intention of user or the intention of wrong identification user.
In frame 304, the second message (user message of the i.e. second wheel chat) from the user for being directed to the first reply is obtained, Wherein first reply is generated based on first message, and first message and second message are natural language form.For example, With reference to figure 1 and Fig. 2, dimensioning machine people 130 obtains the disappearing for reply 222 from user 110 in real time or in non real-time Breath 223.In embodiment of the disclosure, the message from user 110 is the form of natural language, without being fixed machine Device language format, because understanding or training in advance without user.For example, chat robots 120 can be directed to message 223, it generates and replys 224.In some embodiments, replying 224 can be than replying 222 intentions for being more in line with user 110.
First message is determined as instructing in response to determining that second message is semantically clarifying first message in frame 306 Practice the training data of chat conversations.For example, the message 223 in Fig. 2 tells chatting machine in semantically clarification message 221, user 110 Device people 120 it is intended to see《Blame sincere not faze》This film.In this case, start can be not just for chat robots 120 Really parsing message 221, but the clarification by message 223 to message 221, chat robots 120 just identify the meaning of user 110 Figure.Therefore, such message 221 is more suitable for being marked for training chat robots 120.Is described below with reference to Fig. 6 Two message are in several illustrative cases for semantically clarifying first message.
Method 300 according to an embodiment of the present disclosure can collect the training number of high quality automatically from natural language dialogue According to reducing trained cost and improve training effectiveness.
Fig. 4 shows the flow chart according to an embodiment of the present disclosure for obtaining the method 400 of training sample.It should manage Solution, method 400 can be executed by the dimensioning machine people 130 above with reference to described in Fig. 1.In addition, having in chat robots 120 In the case of standby marking Function, method 400 can also be executed by chat robots 120.In addition, frame 402-406 can be with The example implementation of frame 306 described in upper reference chart 3.
In frame 402, the dialogue sample to be marked in chat conversations is identified.For example, dimensioning machine people 130 in real time or Dialogue sample is obtained in non real-time, identifies the second message of user in the situation for semantically clarifying first message, and will in this way Dialogue be determined as the dialogue sample to be marked.Second message, which is described, below with reference to Fig. 6 is semantically clarifying first message Several illustrative cases, for example, dimensioning machine people 130 can identify the clarification based on user 110 and change and have adjusted merely The situation that its robot 120 is replied.
In frame 404, the markup information for user message is determined, markup information is for first message, but it is Based on obtained for clarifying the second message of first message.That is, being marked based on the result after the second wheel intervention Note the message of the first round.In some embodiments, the markup information of user may include intention type and word slot information, word slot letter Breath may include word slot type and word slot value, and an example of markup information is illustrated below with reference to Fig. 5.
In frame 406, user message and markup information are added to training sample in association.For example, dimensioning machine people 130 Labeled data (it includes user message and corresponding markup information) is added in training dataset 140 and is chatted for training Robot 120.For example, the 221 corresponding intention type of message in Fig. 2 is noted as watching movie.When chat robots 120 times one It is secondary receive with 221 same or similar message of message, being intended that for user can be accurately identified and watched movie.With this side Formula, method 400 according to an embodiment of the present disclosure can not only be collected into for trained dialogue sample, and can be realized pair The automatic marking for talking about sample improves the efficiency for obtaining training data.
Fig. 5 shows the schematic diagram 500 of the markup information according to an embodiment of the present disclosure for user message.Such as scheme Shown in 5, for user message 510 (such as " 6 points of tonight helps me to preengage a box, ten people in Quanjude "), mark letter Breath 520 can be determined as follows:Intention type is predetermined dining room, includes three word slots under such intention, i.e., dining room name, Time and number, word slot value are respectively " Quanjude ", " 2018.05.29 18:00 " and " 10 ".In the reality according to the disclosure The second message of example is applied in the semantically scene of clarification first message, chat robots do not accurately identify first message (such as Can not mark or marking error), and after second message is to the clarification of first message, chat robots can be to first Message is accurately marked.That is, marking the user message of the first round based on the result after the second wheel intervention.
In some embodiments, the word slot in each intention type can be set.For example, for the intention of navigation, word slot May include starting point and destination.Alternatively, the word slot of navigation purposes can also include by way of ground, as optional word slot. In some embodiments, if first message has included word slot value, but chat robots only can determine the intention of user Type, without can determine that all word slot values, then chat robots can ask in reply user clarify or user for chat Robot provide reply and actively clarify.
Fig. 6 shows the flow chart that the process 600 of first message is clarified using second message according to the disclosure.It should Understand, Fig. 6 shows that the second message of the user above with reference to described in Fig. 1-4 is shown in some of semantically clarification first message Example situation.
In frame 602, user sends first message to chat robots.In frame 604, chat robots, which provide a user, to be directed to The first of first message is replied.Next, 606, judge whether the first reply is question sentence.If the first reply is not question sentence (be, for example, declarative sentence), then need user initiatively to go to clarify in this case, 608, user actively clarification and to Chat robots send second message.Then, 610, the active clarification based on user, chat robots provide a user more Meet user view second is replied.If judging that the first reply is question sentence 606, chat robots active in this case Rhetorical question, and user's only passive clarification, 612, user passively clarifies and sends second message to chat robots.Then, 614, the answer of the rhetorical question in being replied based on user couple first, chat robots, which provide a user, is more in line with user view Second replys.
Next, showing that second message is actively clear in several illustrative cases for semantically clarifying first message, including user Clearly (first reply be declarative sentence) when user actively explain, user's active correction when user actively clarify, user it is passively clear Clearly (first reply be confirmative question) when user answer rhetorical question and user when user passively clarifies selects candidate in rhetorical question As a result.It should be appreciated that following several situations are given for example only some illustrative cases for describing implementation of the disclosure example, rather than It limits the scope of the present disclosure.The scope of the present disclosure is limited to the appended claims.
I. user actively clarifies --- and user actively explains
In the case of chat robots wrong identification or None- identified user view, user can actively solve It releases.For example, second message semantically clarification first message scene in, if chat robots provide a user first It is declarative sentence to reply and first replys the intention for not accurately identifying user, and user can carry out active solution by second message It releases.In some embodiments, second message may include the additional information for explaining first message, and chat robots are based on attached Add information and is replied for the second of second message to generate.For example, following table 1-3 shows that user is actively solved by second message Release the dialogue example of first message.
Table 1
The first message of user: I will see《Blame sincere not faze》
The first of chat robots is replied: Sorry, I does not understand your meaning
The second message of user: I is to watch movie
The second of chat robots is replied: Playing film ---《Blame sincere not faze》
Table 2
Table 3
The first message of user: West Second Qi, triggering
The first of chat robots is replied: Going to West Second Qi
The second message of user: I is to be triggered from West Second Qi
The second of chat robots is replied: It is triggered from West Second Qi, may I ask you will be where?
In the example of table 1, the first of chat robots reply the intention without successfully identifying user, and the second of user disappears Breath actively explains it and is intended to watch movie, and the explanation based on user and clarification, chat robots can determine the intention of user It is to watch movie《Blame sincere not faze》.In the example of table 2, the first of chat robots reply the wrong identification intention of user, know Not at broadcasting variety show《Blame sincere not faze》, the second message of user explain its be intended to really to watch movie rather than variety section Mesh, the explanation based on user and clarification, chat robots can determine that being intended that for user is watched movie《Blame sincere not faze》.In table 3 Example in, chat robots start the word slot type in wrong identification user view and (are identified as departure place " West Second Qi " Destination), it is departure place rather than destination that the second message of user, which actively explains " West Second Qi ",.In this way, user First message is explained by using the second message of natural language form so that chat robots can accurately identify user's It is intended to.
It should be appreciated that explaining is clarified on the basis of origination message, on the contrary, supplement is increased on original base Completely new content is added, the meaning of the two is variant.The case where being lacked for part word slot, for example, first message pertains only to order The intention of ticket is without regard to destination word slot value, and second message includes destination word slot value, and such case belongs to second message Supplement the situation of first message.
II. user actively clarifies --- and user actively corrects
In the case of chat robots wrong identification or None- identified user view, user can actively entangle Just.For example, second message semantically clarification first message scene in, if chat robots provide a user first It is declarative sentence to reply and first replys the intention for not accurately identifying user, and user can carry out actively entangling by second message Just.In some embodiments, second message may include the correction to identification mistake or typing error in first message, chat Robot can be replied to generate for the second of second message based on correcting.For example, following table 4-8 shows user by Two message actively correct the dialogue example of first message.
Table 4
The first message of user: I will go to West Second Qi
The first of chat robots is replied: Sorry, I does not understand your meaning
The second message of user: Navigate to West Second Qi
The second of chat robots is replied: Inquiring the navigation routine of West Second Qi
Table 5
The first message of user: I will go to west two strange
The first of chat robots is replied: Sorry, I does not understand your meaning
The second message of user: It is West Second Qi
The second of chat robots is replied: Inquiring the navigation routine of West Second Qi
Table 6
The first message of user: Search the contact method of Wang Feng
The first of chat robots is replied: The contact method of Wang Feng is being inquired, please later
The second message of user: It is the rich of good harvest
The second of chat robots is replied: The contact method of Wang Feng is being inquired, please later
Table 7
The first message of user: I will go to practise two very
The first of chat robots is replied: Sorry, I does not understand your meaning
The second message of user: It is the west of thing, the flag of national flag
The second of chat robots is replied: Inquiring the navigation routine of West Second Qi
Table 8
The first message of user: Phone 13511652271
The first of chat robots is replied: Phoning 13511652271
The second message of user: It is 110
The second of chat robots is replied: Phoning 13511052271
In the example of table 4-5 and 7, chat robots do not have successfully to identify the intention of user when first replys;In table 6 In 8 example, chat robots wrong identification intention of user when first replys.For first time of chat robots Multiple, user actively corrects identification mistake in first message or typing error (for example, correcting one by using second message Or multiple words, number etc.) so that chat robots can accurately identify the intention of user when second replys.In addition, second Message may be the content of complete repetition or partial repetition first message.It should be appreciated that repeating the effect for having correction, instantly Secondary when having user to send out same message again, chat robots may be implemented correctly to identify.
III. system rhetorical question clarification --- user answers rhetorical question
In the case where chat robots do not know user view, user's clarification can be actively asked in reply, user can be to result Feedback is to continue to talk with.For example, second message semantically clarification first message scene in, if chat robots to The first reply that family provides is confirmative question, then user can carry out answering by second message the rhetorical question of chat robots.One In a little embodiments, second message may include the answer of the rhetorical question in replying first, and chat robots are based on the answer next life It is replied at second for second message.For example, following table 9-11 shows that user answers chat robots by second message Rhetorical question dialogue example.
Table 9
Table 10
Table 11
In the example of table 9, chat robots only recognize a not high enough result of confidence level (i.e. when first replys Order train ticket), chat robots are confirmed by asking in reply user.If the rhetorical question result of chat robots is accurate, user It can be made an affirmation or be negated to continue to talk with by natural language dialogue, for example, it is to order train ticket that user, which can be confirmed, then chatted Its robot accurately identifies the intention for ordering train ticket.In the example of table 10 and 11, chat robots can not accurately identify user Word slot value whether belong to some word slot type, can confirm word slot information by asking in reply.In this way, user uses Second message answers the rhetorical question (for example, confirm, deny, change word slot etc.) of chat robots so that chat robots can Accurately identify the intention of user.
IV. system rhetorical question clarification --- user selects the candidate result in rhetorical question
In the case where chat robots recognize multiple candidate results, user's clarification can be actively asked in reply, user can be right As a result it feeds back to continue to talk with.For example, in second message in the semantically scene of clarification first message, if chat robots It is the confirmative question for including multiple candidate results that first provided a user, which is replied, then user can be answered by second message and be chatted The rhetorical question of robot.In some embodiments, second message may include the selection of multiple candidate results in replying first, Chat robots are replied to generate for the second of second message based on user's selection.For example, following table 12-14 shows user The dialogue example of the candidate result in the rhetorical question of chat robots is selected by second message.
Table 12
Table 13
Table 14
In the example of table 12-14, the first reply of chat robots recognizes multiple candidate results, and chat robots can Actively to ask in reply user, user can to select some candidate result therein (for example, selected and sorted position, using with candidate As a result related keyword etc.), to realize that user passively clarifies the effect of first message.In this way, user uses the Candidate result in rhetorical question of two message to select chat robots so that chat robots can accurately identify the meaning of user Figure.
In some embodiments, during the chat conversations of user and chat robots, known when user wishes to change When other word slot result, user can be directly modified by natural language message.For example, if the first message of user relates to And it is predetermined remove Pekinese's plane ticket, the first of chat robots, which reply inquiry, removes Pekinese's flight, if user next the Destination word slot is revised as Nanjing by two message, then the second of chat robots, which is replied, is then revised as the flight that Nanjing is gone in inquiry. In addition, when chat robots wrong identification, None- identified, uncertain recognition result occur or recognize multiple results, actively User is asked in reply, user can also directly express new intention, and chat robots can accurately identify new intention rather than continue In clarifying process before resting on.
In some embodiments, the second message of user can be too drastic language, in this case, chat robots Second message based on too drastic language, it may be determined that its first reply fails to accurately identify the intention of user, thus machine of chatting People asks further message or the confirmation of user using confirmative question in being replied second.For example, user can use it is " stupid The too drastic language of egg ", " too stupid " etc is negating the rhetorical question of chat robots.
In some embodiments, it is replied for the first of the rhetorical question form of chat robots, the candidate result in rhetorical question In the case of all inaccurate, user can directly be denied by natural language message, such as send out " not being ", " not to " etc. Message is corrected in addition, user can also directly send out new message content.In this case, chat robots can be with Think that the recognition result in the first reply does not meet the intention of user, will be adjusted with provide more candidate results or Person inquires user further clarification or supplement.
In some embodiments, if chat robots do not accurately identify always the intention of user, chat robots can be with Continue to ask in reply, such as initiate the third that the second of rhetorical question form replied, asked in reply form and reply, until obtaining the affirmative knot of user Fruit.Alternatively, the threshold number (such as 3 times) of chat robots rhetorical question can also be set, after more than threshold number, even if Chat robots do not accurately identify the intention of user yet, do not continue to ask in reply yet, but terminate dialogue, and tell user its still The meaning of user is not can be appreciated that so.In some embodiments, if the second message of user is to stop the declaration of will of chat conversations, Then chat robots correspondingly stop dialog procedure.
In some embodiments, in the case where chat robots recognize multiple candidate results, chat robots pass through First replys the multiple candidate results of offer selects for user, if these candidate results do not comply with the intention of user, user Can by second message directly negate or instruction change a collection of candidate result.Second message of the chat robots based on user, Continue to generate rhetorical question form second is replied, to ask further clarification or the supplement of user.
Fig. 7 shows the block diagram according to an embodiment of the present disclosure for collecting the device 700 of training data.Such as Fig. 7 institutes Show, device 700 includes that first message obtains module 710, second message obtains module 720 and training data determining module 730. First message obtains module 710 and is configured as obtaining first message from the user.Second message obtains module 720 and is configured as The second message from the user for being directed to the first reply is obtained, wherein the first reply is generated based on first message, and the One message and second message are natural language form.Training data determining module 730 is configured to respond to determine that second disappears Breath is semantically clarifying first message, and first message is determined as the training data for training chat conversations.
In some embodiments, wherein training data determining module 730 includes:Markup information determining module, is configured as The semanteme of first message is clarified based on second message, determines the markup information for first message;And association determining module, It is configured as first message and markup information being determined as training data in association.
In some embodiments, wherein markup information determining module include be intended to determining module, it is intended that determining module by with It is set to the intention type and word slot information of determining first message, wherein word slot information includes word slot type and word slot value.
In some embodiments, device 700 further includes that declarative sentence provides module, is configured as providing a user first time Multiple, first replys as declarative sentence form and does not accurately identify the intention of user;And/or confirmative question provides module, is configured as The first reply is provided a user, first replys as confirmative question form.
In some embodiments, device 700 further includes the first generation module, and it includes using to be configured to respond to second message In the additional information for explaining first message, replied for the second of second message based on additional information to generate;And first carry For module, it is configured as providing a user the second reply.
In some embodiments, device 700 further includes:Second generation module, being configured to respond to second message includes Correction to identification mistake or typing error in first message, is replied to generate for the second of second message based on correcting; And second provide module, be configured as providing a user the second reply.
In some embodiments, device 700 further includes:Third generation module, being configured to respond to second message includes The answer of rhetorical question in replying first is replied to generate for the second of second message based on answering;And third provides mould Block is configured as providing a user the second reply.
In some embodiments, wherein third generation module includes the 4th generation module, and the 4th generation module is configured as The selection of multiple candidate results in being replied based on second message pair first is generated second and replied.
It should be appreciated that the first message shown in Fig. 7 obtains module 710, second message obtains module 720 and training Data determining module 730 can be included in the dimensioning machine people 130 with reference to described in figure 1 or in chat robots 120 ( In the case that chat robots 120 have the function of mark).Furthermore, it is to be understood that the module shown in Fig. 7 can execute With reference to embodiment of the disclosure method or in the process the step of or action.
Fig. 8 shows the schematic block diagram for the example apparatus 800 that can be used for implementing embodiment of the disclosure.It should manage Solution, equipment 800 can be used to implement the described device 700 for collecting training data of the disclosure, chat robots 120 or Person's dimensioning machine people 130.As shown, equipment 800 includes central processing unit (CPU) 801, it can be read-only according to being stored in Computer program instructions in memory (ROM) 802 are loaded into random access storage device (RAM) 803 from storage unit 808 In computer program instructions, to execute various actions appropriate and processing.It, can also the operation of storage device 800 in RAM 803 Required various programs and data.CPU 801, ROM 802 and RAM 803 are connected with each other by bus 804.Input/output (I/O) interface 805 is also connected to bus 804.
Multiple components in equipment 800 are connected to I/O interfaces 805, including:Input unit 806, such as keyboard, mouse etc.; Output unit 807, such as various types of displays, loud speaker etc.;Storage unit 808, such as disk, CD etc.;And it is logical Believe unit 809, such as network interface card, modem, wireless communication transceiver etc..Communication unit 809 allows equipment 800 by such as The computer network of internet and/or various telecommunication networks exchange information/data with other equipment.
Processing unit 801 executes each method and process as described above.For example, in some embodiments, method or Process can be implemented as computer software programs, be tangibly embodied in machine readable media, such as storage unit 808.? In some embodiments, some or all of of computer program can be loaded into via ROM 802 and/or communication unit 809 And/or it is installed in equipment 800.When computer program loads are to RAM 803 and executed by CPU 801, can execute above One or more actions of the method or process of description or step.Alternatively, in other embodiments, CPU 801 can pass through Other any modes (for example, by means of firmware) appropriate and be configured as execution method or process.
Function described herein can be executed by one or more hardware logic components at least partly.Example Such as, without limitation, the hardware logic component for the exemplary type that can be used includes:Field programmable gate array (FPGA), specially With integrated circuit (ASIC), Application Specific Standard Product (ASSP), the system (SOC) of system on chip, load programmable logic device (CPLD), etc..
Any combinations that one or more programming languages may be used in program code for implementing disclosed method are come It writes.These program codes can be supplied to the place of all-purpose computer, special purpose computer or other programmable data processing units Manage device or controller so that program code makes defined in flowchart and or block diagram when by processor or controller execution Function/operation is carried out.Program code can execute completely on machine, partly execute on machine, as stand alone software Is executed on machine and partly execute or executed on remote machine or server completely on the remote machine to packet portion.
In the context of the disclosure, machine readable media can be tangible medium, can include or be stored for The program that instruction execution system, device or equipment are used or is used in combination with instruction execution system, device or equipment.Machine can It can be machine-readable signal medium or machine-readable storage medium to read medium.Machine readable media can include but is not limited to electricity Son, magnetic, optical, electromagnetism, infrared or semiconductor system, device or equipment or the above any conjunction Suitable combination.The more specific example of machine readable storage medium will include being electrically connected of line based on one or more, portable meter Calculation machine disk, hard disk, random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM Or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage facilities or Any appropriate combination of the above.
Although in addition, depicting each action or step using certain order, this should be understood as that requirement acts in this way Or step is executed with shown certain order or in sequential order, or require the action of all diagrams or step that should be performed To obtain desired result.Under certain environment, it may be advantageous for multitask and parallel processing.Similarly, although above Several specific implementation details are contained in discussion, but these are not construed as the limitation to the scope of the present disclosure.In list Certain features described in the context of only embodiment can also be realized in combination in single realize.On the contrary, single Various features described in the context of realization can also individually or in any suitable subcombination be realized multiple In realization.
Although having used the implementation specific to the language description of the structure feature and/or method logical action disclosure Example it should be appreciated that the theme defined in the appended claims is not necessarily limited to special characteristic described above or dynamic Make.On the contrary, special characteristic described above and action are only to realize the exemplary forms of claims.

Claims (18)

1. a kind of method for collecting training data, including:
Obtain first message from the user;
Obtain from the user for first reply second message, it is described first reply based on the first message and by It generates, and the first message and the second message are natural language form;And
The first message is semantically being clarified in response to the determination second message, the first message is being determined as instructing Practice the training data of chat conversations.
2. according to the method described in claim 1, the first message be wherein determined as the training data including:
The semanteme of the first message is clarified based on the second message, determines the markup information for the first message; And
The first message and the markup information are determined as the training data in association.
3. according to the method described in claim 2, wherein determining that the markup information for the first message includes:
Determine the intention type and word slot information of the first message, institute's predicate slot information includes word slot type and word slot value.
4. according to the method described in claim 1, further including at least one of following:
Described first is provided to the user to reply, described first replys as declarative sentence form and do not accurately identify the user Intention;And
Described first is provided to the user to reply, described first replys as confirmative question form.
5. according to the method described in claim 1, further including:
Include the additional information for explaining the first message in response to the second message, is based on the additional information next life It is replied at second for the second message;And
Described second is provided to the user to reply.
6. according to the method described in claim 1, further including:
Include the correction to identification mistake or typing error in the first message in response to the second message, based on described It corrects and is replied for the second of the second message to generate;And
Described second is provided to the user to reply.
7. according to the method described in claim 1, further including:
Include the answer of the rhetorical question in replying described first in response to the second message, is directed to based on the answer to generate The second of the second message is replied;And
Described second is provided to the user to reply.
8. according to the method described in claim 7, wherein further including to generate described second and reply based on the answer:
The selection of multiple candidate results in being replied described first based on the second message is generated described second and replied.
9. a kind of device for collecting training data, including:
First message obtains module, is configured as obtaining first message from the user;
Second message obtains module, is configured as obtaining the second message replied for first from the user, and described the One reply is generated based on the first message, and the first message and the second message are natural language shape Formula;And
Training data determining module is configured to respond to determine that the second message is semantically clarifying the first message, The first message is determined as the training data for training chat conversations.
10. device according to claim 9, wherein the training data determining module includes:
Markup information determining module is configured as clarifying the semanteme of the first message based on the second message, determines needle To the markup information of the first message;And
It is associated with determining module, is configured as the first message and the markup information being determined as the trained number in association According to.
11. device according to claim 10, wherein the markup information determining module includes:
It is intended to determining module, is configured to determine that the intention type and word slot information of the first message, institute's predicate slot packet Include word slot type and word slot value.
12. device according to claim 10 further includes at least one of following:
Declarative sentence provides module, is configured as providing first reply to the user, described first replys as declarative sentence shape Formula and the intention for not accurately identifying the user;And
Confirmative question provides module, is configured as providing first reply to the user, described first replys as confirmative question shape Formula.
13. device according to claim 10, further includes:
First generation module, it includes the additional letter for explaining the first message to be configured to respond to the second message Breath is replied to generate for the second of the second message based on the additional information;And
First provides module, is configured as providing second reply to the user.
14. device according to claim 10, further includes:
Second generation module, be configured to respond to the second message include in the first message identification mistake or beat The correction of character error is replied to generate for the second of the second message based on the correction;And
Second provides module, is configured as providing second reply to the user.
15. device according to claim 10, further includes:
Third generation module is configured to respond to the answer that the second message includes the rhetorical question in replying described first, It is replied for the second of the second message based on the answer to generate;And
Third provides module, is configured as providing second reply to the user.
16. device according to claim 15, wherein the third generation module includes:
4th generation module is configured as the choosing of multiple candidate results in being replied described first based on the second message It selects, generates described second and reply.
17. a kind of electronic equipment, the electronic equipment include:
One or more processors;And
Storage device, for storing one or more programs, one or more of programs are when by one or more of processing Device executes so that the electronic equipment realizes the method according to any one of claim 1-8.
18. a kind of computer readable storage medium is stored thereon with computer program, is realized when described program is executed by processor According to the method described in any one of claim 1-8.
CN201810553778.3A 2018-05-31 2018-05-31 Collect method, apparatus, equipment and the computer readable storage medium of training data Pending CN108763548A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810553778.3A CN108763548A (en) 2018-05-31 2018-05-31 Collect method, apparatus, equipment and the computer readable storage medium of training data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810553778.3A CN108763548A (en) 2018-05-31 2018-05-31 Collect method, apparatus, equipment and the computer readable storage medium of training data

Publications (1)

Publication Number Publication Date
CN108763548A true CN108763548A (en) 2018-11-06

Family

ID=64001833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810553778.3A Pending CN108763548A (en) 2018-05-31 2018-05-31 Collect method, apparatus, equipment and the computer readable storage medium of training data

Country Status (1)

Country Link
CN (1) CN108763548A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109639444A (en) * 2019-02-20 2019-04-16 腾讯科技(深圳)有限公司 Message treatment method, device, electronic equipment and storage medium
CN110427461A (en) * 2019-08-06 2019-11-08 腾讯科技(深圳)有限公司 Intelligent answer information processing method, electronic equipment and computer readable storage medium
CN110866100A (en) * 2019-11-07 2020-03-06 北京声智科技有限公司 Phonetics generalization method and device and electronic equipment
CN112015958A (en) * 2019-05-31 2020-12-01 微软技术许可有限责任公司 Content recommendation in automatic chat
CN112199591A (en) * 2020-10-10 2021-01-08 何波昌 Air ticket checking and booking method, server and medium based on social software chat window
CN114430378A (en) * 2020-10-15 2022-05-03 ***通信集团浙江有限公司 Chat robot anomaly detection method and device, computing device and storage medium
WO2022089546A1 (en) * 2020-10-28 2022-05-05 华为云计算技术有限公司 Label generation method and apparatus, and related device
US11720634B2 (en) 2021-03-09 2023-08-08 International Business Machines Corporation Automatic generation of clarification questions for conversational search

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105027197A (en) * 2013-03-15 2015-11-04 苹果公司 Training an at least partial voice command system
US20170169115A1 (en) * 2015-12-09 2017-06-15 Industrial Technology Research Institute Internet question answering system and method, and computer readable recording media
CN107463601A (en) * 2017-06-13 2017-12-12 北京百度网讯科技有限公司 Dialogue based on artificial intelligence understands system constituting method, device, equipment and computer-readable recording medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105027197A (en) * 2013-03-15 2015-11-04 苹果公司 Training an at least partial voice command system
US20170169115A1 (en) * 2015-12-09 2017-06-15 Industrial Technology Research Institute Internet question answering system and method, and computer readable recording media
CN107463601A (en) * 2017-06-13 2017-12-12 北京百度网讯科技有限公司 Dialogue based on artificial intelligence understands system constituting method, device, equipment and computer-readable recording medium

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109639444A (en) * 2019-02-20 2019-04-16 腾讯科技(深圳)有限公司 Message treatment method, device, electronic equipment and storage medium
CN112015958A (en) * 2019-05-31 2020-12-01 微软技术许可有限责任公司 Content recommendation in automatic chat
CN110427461A (en) * 2019-08-06 2019-11-08 腾讯科技(深圳)有限公司 Intelligent answer information processing method, electronic equipment and computer readable storage medium
CN110427461B (en) * 2019-08-06 2023-04-07 腾讯科技(深圳)有限公司 Intelligent question and answer information processing method, electronic equipment and computer readable storage medium
CN110866100A (en) * 2019-11-07 2020-03-06 北京声智科技有限公司 Phonetics generalization method and device and electronic equipment
CN110866100B (en) * 2019-11-07 2022-08-23 北京声智科技有限公司 Phonetics generalization method and device and electronic equipment
CN112199591A (en) * 2020-10-10 2021-01-08 何波昌 Air ticket checking and booking method, server and medium based on social software chat window
CN112199591B (en) * 2020-10-10 2023-09-26 何波昌 Air ticket checking and booking method, server and medium based on social software chat window
CN114430378A (en) * 2020-10-15 2022-05-03 ***通信集团浙江有限公司 Chat robot anomaly detection method and device, computing device and storage medium
CN114430378B (en) * 2020-10-15 2023-08-18 ***通信集团浙江有限公司 Anomaly detection method and device for chat robot, computing device and storage medium
WO2022089546A1 (en) * 2020-10-28 2022-05-05 华为云计算技术有限公司 Label generation method and apparatus, and related device
US11720634B2 (en) 2021-03-09 2023-08-08 International Business Machines Corporation Automatic generation of clarification questions for conversational search

Similar Documents

Publication Publication Date Title
CN108763548A (en) Collect method, apparatus, equipment and the computer readable storage medium of training data
US10776580B2 (en) Method for providing dialogue service with chatbot assisted by human agents
US10303758B2 (en) Systems methods and computer-readable storage media for real-time automated conversational agent
WO2018224034A1 (en) Intelligent question answering method, server, terminal and storage medium
JP6604836B2 (en) Dialog text summarization apparatus and method
CN109002501A (en) For handling method, apparatus, electronic equipment and the computer readable storage medium of natural language dialogue
US20170124064A1 (en) Reply information recommendation method and apparatus
CN109101545A (en) Natural language processing method, apparatus, equipment and medium based on human-computer interaction
CN109325091B (en) Method, device, equipment and medium for updating attribute information of interest points
CN107818798A (en) Customer service quality evaluating method, device, equipment and storage medium
CN111177359A (en) Multi-turn dialogue method and device
CN108447471A (en) Audio recognition method and speech recognition equipment
CN107430616A (en) The interactive mode of speech polling re-forms
US10950223B2 (en) System and method for analyzing partial utterances
CN110610705A (en) Voice interaction prompter based on artificial intelligence
CN108877792A (en) For handling method, apparatus, electronic equipment and the computer readable storage medium of voice dialogue
CN116737908A (en) Knowledge question-answering method, device, equipment and storage medium
CN110288995A (en) Exchange method, device, storage medium and electronic equipment based on speech recognition
CN113140138A (en) Interactive teaching method, device, storage medium and electronic equipment
CN108897771B (en) Automatic question answering method and device, computer readable storage medium and electronic equipment
CN109739969A (en) Answer generation method and intelligent conversational system
CN114490975A (en) User question labeling method and device
CN113111658B (en) Method, device, equipment and storage medium for checking information
WO2020144636A1 (en) Artificial intelligence system for business processes
CN110047473A (en) A kind of man-machine collaboration exchange method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181106

RJ01 Rejection of invention patent application after publication