CN108763548A - Collect method, apparatus, equipment and the computer readable storage medium of training data - Google Patents
Collect method, apparatus, equipment and the computer readable storage medium of training data Download PDFInfo
- Publication number
- CN108763548A CN108763548A CN201810553778.3A CN201810553778A CN108763548A CN 108763548 A CN108763548 A CN 108763548A CN 201810553778 A CN201810553778 A CN 201810553778A CN 108763548 A CN108763548 A CN 108763548A
- Authority
- CN
- China
- Prior art keywords
- message
- user
- reply
- replied
- training data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Transfer Between Computers (AREA)
Abstract
According to an example embodiment of the present disclosure, a kind of method, apparatus, electronic equipment and computer readable storage medium for collecting training data is provided.Method includes obtaining first message from the user and obtaining the second message from the user for being directed to the first reply, wherein the first reply is generated based on first message, and first message and second message are natural language form.Method further includes in response to determining that second message is semantically clarifying first message, first message being determined as the training data for training chat conversations.In accordance with an embodiment of the present disclosure, for the situation for having adjusted robot reply due to user's clarification in chat conversations, the chat messages of user can be automatically added to training sample, it is achieved in the training data for directly collecting high quality from natural language dialogue, reduce trained cost and improves training effectiveness.
Description
Technical field
Embodiment of the disclosure relates generally to artificial intelligence field, and more particularly relates to collect training data
Method, apparatus, electronic equipment and computer readable storage medium.
Background technology
In recent years, the theory of " dialogue is platform (Conversation as a Platform) " is increasingly rooted in the hearts of the people, more
Begin to use conversational man-machine interaction mode come more networking products and application.Chat robots refer to that can pass through text
Word, voice or picture etc. realize the computer program or software of human-computer interaction, are appreciated that the content that user sends out, and certainly
It is dynamic to respond.Chat robots can replace true man to engage in the dialogue to a certain extent, can be integrated into conversational system
It is middle to be used as automatic on-line assistant, for such as scenes such as intelligence chat, customer service, information query.
In order to keep chat robots more intelligent and be chatted in a manner of human conversation, it usually needs use training number
According to being trained.Training data is the data marked for training machine learning model, can be used for improving model
Performance.In general, in the training process of chat robots, dialogue sample mark, model training and compliance test result are as opposite
Independent function is individually performed.For example, it is desired to could be into after carrying out the dialogue sample mark of certain magnitude for business scenario
Row model training also needs to carry out test and verification effect with chat robots dialogue again after training model, needs parallel
The performance of chat robots is assessed by manually recorded.
Invention content
According to an example embodiment of the present disclosure, a kind of method, apparatus, electronic equipment for collecting training data is provided
And computer readable storage medium.
In the first aspect of the disclosure, a kind of method for collecting training data is provided.This method includes:It obtains
First message from the user;The second message from the user for being directed to the first reply is obtained, wherein first replys based on first
Message and be generated, and first message and second message are natural language form;And in response to determining that second message exists
First message is semantically clarified, first message is determined as the training data for training chat conversations.
In the second aspect of the disclosure, a kind of device for collecting training data is provided.The device includes:First
Message obtains module, is configured as obtaining first message from the user;Second message obtains module, is configured as being come from
The second message of user replied for first, wherein first is replied and is generated based on first message, and first message and the
Two message are natural language form;And training data determining module, it is configured to respond to determine second message in semanteme
First message is determined as the training data for training chat conversations by upper clarification first message.
In the third aspect of the disclosure, a kind of electronic equipment is provided comprising one or more processors and deposit
Storage device, storage device is for storing one or more programs.One or more programs, which are worked as, to be executed by one or more processors,
So that electronic equipment realizes method or process according to an embodiment of the present disclosure.
In the fourth aspect of the disclosure, a kind of computer-readable medium is provided, computer program is stored thereon with, it should
Method or process according to an embodiment of the present disclosure are realized when program is executed by processor.
It should be appreciated that the content described in this part of the disclosure is not intended to limit the key of embodiment of the disclosure
Or important feature, without in limiting the scope of the present disclosure.The other feature of the disclosure will become to hold by description below
It is readily understood.
Description of the drawings
It refers to the following detailed description in conjunction with the accompanying drawings, the above and other feature, advantage and aspect of each embodiment of the disclosure
It will be apparent.In the accompanying drawings, same or analogous reference numeral indicates same or analogous element, wherein:
Fig. 1 shows that embodiment of the disclosure can realize the schematic diagram of example context wherein;
Fig. 2 shows the graphical users of the example dialogue between user according to an embodiment of the present disclosure and chat robots
The diagram at interface (GUI);
Fig. 3 shows the flow chart according to an embodiment of the present disclosure for collecting the method for training data;
Fig. 4 shows the flow chart according to an embodiment of the present disclosure for obtaining the method for training sample;
Fig. 5 shows the schematic diagram of the markup information according to an embodiment of the present disclosure for user message;
Fig. 6 shows the flow chart that the process of first message is clarified using second message according to the disclosure;
Fig. 7 shows the block diagram according to an embodiment of the present disclosure for collecting the device of training data;And
Fig. 8 shows the block diagram of the electronic equipment for the multiple embodiments that can implement the disclosure.
Specific implementation mode
Embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although shown in the drawings of the certain of the disclosure
Embodiment, it should be understood that, the disclosure can be realized by various forms, and should not be construed as being limited to this
In the embodiment that illustrates, it is in order to more thorough and be fully understood by the disclosure to provide these embodiments on the contrary.It should be understood that
It is that being given for example only property of the accompanying drawings and embodiments effect of the disclosure is not intended to limit the protection domain of the disclosure.
In the description of embodiment of the disclosure, term " comprising " and its similar term should be understood as that opening includes,
" include but not limited to ".Term "based" should be understood as " being based at least partially on ".Term " one embodiment " should be managed
Solution is " at least one embodiment ".Hereafter it is also possible that other specific and implicit definition.
Traditional conversational system needs to input user message when adding and marking sample during training, and
It also needs to input the work for once existing and obviously repeating again in verification the verifying results, causes development efficiency low.For conventional method
A kind of improvement be once to complete the verification of dialogue sample, dialogue sample mark and the dialogue work of recruitment evaluation three in the gui.
However, this improved method need to wake up trainer come real time modifying markup information (need using shortcut key or@trainers into
Row wakes up), and fixed format (such as it is SYS_OTHER to be intended to negative format) can only be used, in addition, it is desirable to use key
Disk and mouse toggle, this causes, and input process is complicated and input efficiency is relatively low.In addition, the model that trainer's method is supported
Be with limit, the intention of complex dialogs logic, word slot can only be supported to correct, and fix the required response unit of question and answer scene then without
Method is corrected, with limitation.Meanwhile trainer's method is mainly developer and data assistant director using object, and be unsuitable for
The terminal user of various abnormal conditions is encountered in real scene.Therefore, the cost of conventional method collection training data is higher simultaneously
And efficiency is very low.
Embodiment of the disclosure proposes a kind of scheme for quickly collecting training data.According to the implementation of the disclosure
Example has adjusted the situation of robot reply in chat conversations since user clarifies, and the chat messages of user can be by
It is automatically added to training sample, wakes up trainer without user and using the instruction of predetermined format.Therefore, the disclosure
Embodiment is realized can directly collect the training data of high quality from natural language dialogue, can not only reduce trained cost and
Training effectiveness can be improved.Method according to an embodiment of the present disclosure, user can realize in the gui the mark of dialogue and
The wrong identification result of chat robots can be corrected by talking with by experiencing preferably dialogue effect, user.
Therefore, the disclosure, which is embodiment, will transfer to terminal user to the correction procedure of chat robots, and
Error correction annotation process is can be realized as during the normal natural language dialogue of user, so as to quickly be collected into high quality
Labeled data helps to optimize chat robots.In addition, method according to an embodiment of the present disclosure, fixes under question and answer scene and asks
The data answered in library also can be optimised based on the correction of user.It is described in detail the one of the disclosure below with reference to attached drawing 1-8
A little example embodiments.
Fig. 1 shows that embodiment of the disclosure can realize the schematic diagram of example context 100 wherein.In example context
In 100, user 110 carries out chat conversations 115 with chat robots 120 (being known as " chat engine ").Optionally, user
110 can directly can engage in the dialogue with chat robots 120 in the local of chat robots 120, i.e. user 110.Alternatively,
User 110 can also use its local device (laptop computer, desktop computer, smart phone, tablet computer etc.)
Chat conversations are carried out by network and chat robots 120.It should be appreciated that chat robots 120 can be deployed to it is local
In electronic equipment, it can also be deployed in cloud or by distributed deployment.
With reference to figure 1, environment 100 further includes dimensioning machine people 130, and dimensioning machine people 130 can in non real-time or in real time
From chat robots 120 obtain dialogue sample 125, and the certain dialogue samples identified to using embodiment of the disclosure into
Row automatic marking.In non real-time scene, dimensioning machine people 130 can identify from chat conversations history and can be marked
Dialogue sample.In real-time scene, for each user message, dimensioning machine people 130 can analyse whether in real time can be with
It is added into dialogue sample.After completing to mark, labeled data 135 is sent to training dataset 140 by dimensioning machine people 130
In, training dataset 140 can be used for training 145 chat robots 120, to improve chat robots 120 accuracy and
It is intelligent.In addition, other than obtaining training data from user session, can also be obtained from big data 150 artificial mark and/
Or the training data of automatic marking is for training chat robots 120.
The locally or remotely position that dimensioning machine people 130 may be at chat robots 120 (such as is deployed in the cloud
Or distributed deployment).It should be appreciated that although what is detached in Fig. 1 shows chat robots 120 and dimensioning machine people 130, so
And dimensioning machine people 130 can also integrate with chat robots 120, form chat conversations system.In addition, chatting machine
Device people 120 can integrate the function of automatic marking dialogue sample, and therefore, chat robots 120 can also in real time or non-reality
When labeled data.
Fig. 2 shows the example GUI of the example dialogue between user according to an embodiment of the present disclosure and chat robots
200 diagram, for example, chat conversations shown in GUI 200 can be the user 110 above with reference to described in Fig. 1 and chat
Chat conversations 115 between robot 120.
As shown in Fig. 2, after user 110 opens the chat window with chat robots 120, chat robots 120 can
With send out first greeting message 201 (such as " you are good, I is chat robots, now you can talk with me, have a try
?.").Other than chat conversations 201-224, GUI 200 further includes the window 250 and hair for inputting user 110 and inputting message
Send the button 260 of message.It should be appreciated that user 110 can also by other means (such as voice) and chat robots 120 into
Row dialogue.
With reference to the 211-212 in figure 2, user 110 can initiate the first round with chat robots 120 in chat window and chat
Its dialogue.For example, user 110 sends out the query messages 211 about weather, (such as " how is Beijing weather tomorrow?"), chat
Robot 120 can correctly identify the intention of the inquiry weather of user, and provide reply 212 (such as " Beijing will be fine tomorrow, gas
Warm 18-33 degree.").
With continued reference to Fig. 2, user 110 can also initiate the second wheel dialogue 220 with chat robots 120.For example, user
(such as " I will see 110 message 221 for sending out about amusement《Blame sincere not faze》"), chat robots 120 can not correctly be known at this time
The intention of other user, in the reply 222 for being to provide None- identified intention, (such as " sorry, I does not understand your meaning.").It connects
Get off, user 110 sends the new information 223 (such as " I will watch movie ") for clarifying previous message 221, due to message 223
Semanteme clarification to message 211 so that chat robots 120 can accurately identify being intended that for user and watch movie, and then chat
Robot 120 be capable of providing at this time accurately reply 224 (such as " playing film ---《Blame sincere not faze》.").
In the example GUI 200 of Fig. 2, in the first round talks with 211-212, since chat robots 120 can be correct
Parsing message 211 and the intention for accurately identifying user 110, thus this dialogue sample is chatted typically without training is noted for again
Its robot 120.On the contrary, in the second 220 (i.e. 221-224) of wheel dialogue, chat robots 120 can not accurately identify user's
The intention of message 221.On the contrary, clarification of the user by message 223 to message 221, chat robots 120 can accurately identify
The intention of user 110, thus this dialogue sample is more suitable for being noted for training chat robots 120.Use 220 training of dialogue
The chat robots gone out can be straight according to message 221 when encountering message as message 221 or similar situation again next time
The accurate intention for determining user is connect, sends out further clarification message 223 again without user.Therefore, embodiment of the disclosure
It can realize the training data for directly collecting high quality from natural language dialogue, without the additional artificial mark of user, improve
The collecting efficiency of training data.In addition, the training data collected in accordance with an embodiment of the present disclosure is usually directed to chat robots
The true abnormal scene (badcase) that can not be accurately identified, thus such training data can more improve the standard of chat robots
True property and intelligent.
Fig. 3 shows the flow chart according to an embodiment of the present disclosure for collecting the method 300 of training data.It should manage
Solution, method 300 can be executed by the dimensioning machine people 130 above with reference to described in Fig. 1.In addition, having in chat robots 120
In the case of standby marking Function, method 300 can also be executed by chat robots 120.
In frame 302, first message from the user (i.e. the user message of first round chat) is obtained.For example, with reference to 1 He of figure
Fig. 2, dimensioning machine people 130 obtain the message 221 from user 110 in real time or in non real-time.Chat robots 120 can be with
For message 221, corresponding reply 222 is generated, fails to accurately identify user's wherein replying 222 instruction chat robots 120
It is intended to, such as cannot identifies the intention of user or the intention of wrong identification user.
In frame 304, the second message (user message of the i.e. second wheel chat) from the user for being directed to the first reply is obtained,
Wherein first reply is generated based on first message, and first message and second message are natural language form.For example,
With reference to figure 1 and Fig. 2, dimensioning machine people 130 obtains the disappearing for reply 222 from user 110 in real time or in non real-time
Breath 223.In embodiment of the disclosure, the message from user 110 is the form of natural language, without being fixed machine
Device language format, because understanding or training in advance without user.For example, chat robots 120 can be directed to message
223, it generates and replys 224.In some embodiments, replying 224 can be than replying 222 intentions for being more in line with user 110.
First message is determined as instructing in response to determining that second message is semantically clarifying first message in frame 306
Practice the training data of chat conversations.For example, the message 223 in Fig. 2 tells chatting machine in semantically clarification message 221, user 110
Device people 120 it is intended to see《Blame sincere not faze》This film.In this case, start can be not just for chat robots 120
Really parsing message 221, but the clarification by message 223 to message 221, chat robots 120 just identify the meaning of user 110
Figure.Therefore, such message 221 is more suitable for being marked for training chat robots 120.Is described below with reference to Fig. 6
Two message are in several illustrative cases for semantically clarifying first message.
Method 300 according to an embodiment of the present disclosure can collect the training number of high quality automatically from natural language dialogue
According to reducing trained cost and improve training effectiveness.
Fig. 4 shows the flow chart according to an embodiment of the present disclosure for obtaining the method 400 of training sample.It should manage
Solution, method 400 can be executed by the dimensioning machine people 130 above with reference to described in Fig. 1.In addition, having in chat robots 120
In the case of standby marking Function, method 400 can also be executed by chat robots 120.In addition, frame 402-406 can be with
The example implementation of frame 306 described in upper reference chart 3.
In frame 402, the dialogue sample to be marked in chat conversations is identified.For example, dimensioning machine people 130 in real time or
Dialogue sample is obtained in non real-time, identifies the second message of user in the situation for semantically clarifying first message, and will in this way
Dialogue be determined as the dialogue sample to be marked.Second message, which is described, below with reference to Fig. 6 is semantically clarifying first message
Several illustrative cases, for example, dimensioning machine people 130 can identify the clarification based on user 110 and change and have adjusted merely
The situation that its robot 120 is replied.
In frame 404, the markup information for user message is determined, markup information is for first message, but it is
Based on obtained for clarifying the second message of first message.That is, being marked based on the result after the second wheel intervention
Note the message of the first round.In some embodiments, the markup information of user may include intention type and word slot information, word slot letter
Breath may include word slot type and word slot value, and an example of markup information is illustrated below with reference to Fig. 5.
In frame 406, user message and markup information are added to training sample in association.For example, dimensioning machine people 130
Labeled data (it includes user message and corresponding markup information) is added in training dataset 140 and is chatted for training
Robot 120.For example, the 221 corresponding intention type of message in Fig. 2 is noted as watching movie.When chat robots 120 times one
It is secondary receive with 221 same or similar message of message, being intended that for user can be accurately identified and watched movie.With this side
Formula, method 400 according to an embodiment of the present disclosure can not only be collected into for trained dialogue sample, and can be realized pair
The automatic marking for talking about sample improves the efficiency for obtaining training data.
Fig. 5 shows the schematic diagram 500 of the markup information according to an embodiment of the present disclosure for user message.Such as scheme
Shown in 5, for user message 510 (such as " 6 points of tonight helps me to preengage a box, ten people in Quanjude "), mark letter
Breath 520 can be determined as follows:Intention type is predetermined dining room, includes three word slots under such intention, i.e., dining room name,
Time and number, word slot value are respectively " Quanjude ", " 2018.05.29 18:00 " and " 10 ".In the reality according to the disclosure
The second message of example is applied in the semantically scene of clarification first message, chat robots do not accurately identify first message (such as
Can not mark or marking error), and after second message is to the clarification of first message, chat robots can be to first
Message is accurately marked.That is, marking the user message of the first round based on the result after the second wheel intervention.
In some embodiments, the word slot in each intention type can be set.For example, for the intention of navigation, word slot
May include starting point and destination.Alternatively, the word slot of navigation purposes can also include by way of ground, as optional word slot.
In some embodiments, if first message has included word slot value, but chat robots only can determine the intention of user
Type, without can determine that all word slot values, then chat robots can ask in reply user clarify or user for chat
Robot provide reply and actively clarify.
Fig. 6 shows the flow chart that the process 600 of first message is clarified using second message according to the disclosure.It should
Understand, Fig. 6 shows that the second message of the user above with reference to described in Fig. 1-4 is shown in some of semantically clarification first message
Example situation.
In frame 602, user sends first message to chat robots.In frame 604, chat robots, which provide a user, to be directed to
The first of first message is replied.Next, 606, judge whether the first reply is question sentence.If the first reply is not question sentence
(be, for example, declarative sentence), then need user initiatively to go to clarify in this case, 608, user actively clarification and to
Chat robots send second message.Then, 610, the active clarification based on user, chat robots provide a user more
Meet user view second is replied.If judging that the first reply is question sentence 606, chat robots active in this case
Rhetorical question, and user's only passive clarification, 612, user passively clarifies and sends second message to chat robots.Then,
614, the answer of the rhetorical question in being replied based on user couple first, chat robots, which provide a user, is more in line with user view
Second replys.
Next, showing that second message is actively clear in several illustrative cases for semantically clarifying first message, including user
Clearly (first reply be declarative sentence) when user actively explain, user's active correction when user actively clarify, user it is passively clear
Clearly (first reply be confirmative question) when user answer rhetorical question and user when user passively clarifies selects candidate in rhetorical question
As a result.It should be appreciated that following several situations are given for example only some illustrative cases for describing implementation of the disclosure example, rather than
It limits the scope of the present disclosure.The scope of the present disclosure is limited to the appended claims.
I. user actively clarifies --- and user actively explains
In the case of chat robots wrong identification or None- identified user view, user can actively solve
It releases.For example, second message semantically clarification first message scene in, if chat robots provide a user first
It is declarative sentence to reply and first replys the intention for not accurately identifying user, and user can carry out active solution by second message
It releases.In some embodiments, second message may include the additional information for explaining first message, and chat robots are based on attached
Add information and is replied for the second of second message to generate.For example, following table 1-3 shows that user is actively solved by second message
Release the dialogue example of first message.
Table 1
The first message of user: | I will see《Blame sincere not faze》 |
The first of chat robots is replied: | Sorry, I does not understand your meaning |
The second message of user: | I is to watch movie |
The second of chat robots is replied: | Playing film ---《Blame sincere not faze》 |
Table 2
Table 3
The first message of user: | West Second Qi, triggering |
The first of chat robots is replied: | Going to West Second Qi |
The second message of user: | I is to be triggered from West Second Qi |
The second of chat robots is replied: | It is triggered from West Second Qi, may I ask you will be where? |
In the example of table 1, the first of chat robots reply the intention without successfully identifying user, and the second of user disappears
Breath actively explains it and is intended to watch movie, and the explanation based on user and clarification, chat robots can determine the intention of user
It is to watch movie《Blame sincere not faze》.In the example of table 2, the first of chat robots reply the wrong identification intention of user, know
Not at broadcasting variety show《Blame sincere not faze》, the second message of user explain its be intended to really to watch movie rather than variety section
Mesh, the explanation based on user and clarification, chat robots can determine that being intended that for user is watched movie《Blame sincere not faze》.In table 3
Example in, chat robots start the word slot type in wrong identification user view and (are identified as departure place " West Second Qi "
Destination), it is departure place rather than destination that the second message of user, which actively explains " West Second Qi ",.In this way, user
First message is explained by using the second message of natural language form so that chat robots can accurately identify user's
It is intended to.
It should be appreciated that explaining is clarified on the basis of origination message, on the contrary, supplement is increased on original base
Completely new content is added, the meaning of the two is variant.The case where being lacked for part word slot, for example, first message pertains only to order
The intention of ticket is without regard to destination word slot value, and second message includes destination word slot value, and such case belongs to second message
Supplement the situation of first message.
II. user actively clarifies --- and user actively corrects
In the case of chat robots wrong identification or None- identified user view, user can actively entangle
Just.For example, second message semantically clarification first message scene in, if chat robots provide a user first
It is declarative sentence to reply and first replys the intention for not accurately identifying user, and user can carry out actively entangling by second message
Just.In some embodiments, second message may include the correction to identification mistake or typing error in first message, chat
Robot can be replied to generate for the second of second message based on correcting.For example, following table 4-8 shows user by
Two message actively correct the dialogue example of first message.
Table 4
The first message of user: | I will go to West Second Qi |
The first of chat robots is replied: | Sorry, I does not understand your meaning |
The second message of user: | Navigate to West Second Qi |
The second of chat robots is replied: | Inquiring the navigation routine of West Second Qi |
Table 5
The first message of user: | I will go to west two strange |
The first of chat robots is replied: | Sorry, I does not understand your meaning |
The second message of user: | It is West Second Qi |
The second of chat robots is replied: | Inquiring the navigation routine of West Second Qi |
Table 6
The first message of user: | Search the contact method of Wang Feng |
The first of chat robots is replied: | The contact method of Wang Feng is being inquired, please later |
The second message of user: | It is the rich of good harvest |
The second of chat robots is replied: | The contact method of Wang Feng is being inquired, please later |
Table 7
The first message of user: | I will go to practise two very |
The first of chat robots is replied: | Sorry, I does not understand your meaning |
The second message of user: | It is the west of thing, the flag of national flag |
The second of chat robots is replied: | Inquiring the navigation routine of West Second Qi |
Table 8
The first message of user: | Phone 13511652271 |
The first of chat robots is replied: | Phoning 13511652271 |
The second message of user: | It is 110 |
The second of chat robots is replied: | Phoning 13511052271 |
In the example of table 4-5 and 7, chat robots do not have successfully to identify the intention of user when first replys;In table 6
In 8 example, chat robots wrong identification intention of user when first replys.For first time of chat robots
Multiple, user actively corrects identification mistake in first message or typing error (for example, correcting one by using second message
Or multiple words, number etc.) so that chat robots can accurately identify the intention of user when second replys.In addition, second
Message may be the content of complete repetition or partial repetition first message.It should be appreciated that repeating the effect for having correction, instantly
Secondary when having user to send out same message again, chat robots may be implemented correctly to identify.
III. system rhetorical question clarification --- user answers rhetorical question
In the case where chat robots do not know user view, user's clarification can be actively asked in reply, user can be to result
Feedback is to continue to talk with.For example, second message semantically clarification first message scene in, if chat robots to
The first reply that family provides is confirmative question, then user can carry out answering by second message the rhetorical question of chat robots.One
In a little embodiments, second message may include the answer of the rhetorical question in replying first, and chat robots are based on the answer next life
It is replied at second for second message.For example, following table 9-11 shows that user answers chat robots by second message
Rhetorical question dialogue example.
Table 9
Table 10
Table 11
In the example of table 9, chat robots only recognize a not high enough result of confidence level (i.e. when first replys
Order train ticket), chat robots are confirmed by asking in reply user.If the rhetorical question result of chat robots is accurate, user
It can be made an affirmation or be negated to continue to talk with by natural language dialogue, for example, it is to order train ticket that user, which can be confirmed, then chatted
Its robot accurately identifies the intention for ordering train ticket.In the example of table 10 and 11, chat robots can not accurately identify user
Word slot value whether belong to some word slot type, can confirm word slot information by asking in reply.In this way, user uses
Second message answers the rhetorical question (for example, confirm, deny, change word slot etc.) of chat robots so that chat robots can
Accurately identify the intention of user.
IV. system rhetorical question clarification --- user selects the candidate result in rhetorical question
In the case where chat robots recognize multiple candidate results, user's clarification can be actively asked in reply, user can be right
As a result it feeds back to continue to talk with.For example, in second message in the semantically scene of clarification first message, if chat robots
It is the confirmative question for including multiple candidate results that first provided a user, which is replied, then user can be answered by second message and be chatted
The rhetorical question of robot.In some embodiments, second message may include the selection of multiple candidate results in replying first,
Chat robots are replied to generate for the second of second message based on user's selection.For example, following table 12-14 shows user
The dialogue example of the candidate result in the rhetorical question of chat robots is selected by second message.
Table 12
Table 13
Table 14
In the example of table 12-14, the first reply of chat robots recognizes multiple candidate results, and chat robots can
Actively to ask in reply user, user can to select some candidate result therein (for example, selected and sorted position, using with candidate
As a result related keyword etc.), to realize that user passively clarifies the effect of first message.In this way, user uses the
Candidate result in rhetorical question of two message to select chat robots so that chat robots can accurately identify the meaning of user
Figure.
In some embodiments, during the chat conversations of user and chat robots, known when user wishes to change
When other word slot result, user can be directly modified by natural language message.For example, if the first message of user relates to
And it is predetermined remove Pekinese's plane ticket, the first of chat robots, which reply inquiry, removes Pekinese's flight, if user next the
Destination word slot is revised as Nanjing by two message, then the second of chat robots, which is replied, is then revised as the flight that Nanjing is gone in inquiry.
In addition, when chat robots wrong identification, None- identified, uncertain recognition result occur or recognize multiple results, actively
User is asked in reply, user can also directly express new intention, and chat robots can accurately identify new intention rather than continue
In clarifying process before resting on.
In some embodiments, the second message of user can be too drastic language, in this case, chat robots
Second message based on too drastic language, it may be determined that its first reply fails to accurately identify the intention of user, thus machine of chatting
People asks further message or the confirmation of user using confirmative question in being replied second.For example, user can use it is " stupid
The too drastic language of egg ", " too stupid " etc is negating the rhetorical question of chat robots.
In some embodiments, it is replied for the first of the rhetorical question form of chat robots, the candidate result in rhetorical question
In the case of all inaccurate, user can directly be denied by natural language message, such as send out " not being ", " not to " etc.
Message is corrected in addition, user can also directly send out new message content.In this case, chat robots can be with
Think that the recognition result in the first reply does not meet the intention of user, will be adjusted with provide more candidate results or
Person inquires user further clarification or supplement.
In some embodiments, if chat robots do not accurately identify always the intention of user, chat robots can be with
Continue to ask in reply, such as initiate the third that the second of rhetorical question form replied, asked in reply form and reply, until obtaining the affirmative knot of user
Fruit.Alternatively, the threshold number (such as 3 times) of chat robots rhetorical question can also be set, after more than threshold number, even if
Chat robots do not accurately identify the intention of user yet, do not continue to ask in reply yet, but terminate dialogue, and tell user its still
The meaning of user is not can be appreciated that so.In some embodiments, if the second message of user is to stop the declaration of will of chat conversations,
Then chat robots correspondingly stop dialog procedure.
In some embodiments, in the case where chat robots recognize multiple candidate results, chat robots pass through
First replys the multiple candidate results of offer selects for user, if these candidate results do not comply with the intention of user, user
Can by second message directly negate or instruction change a collection of candidate result.Second message of the chat robots based on user,
Continue to generate rhetorical question form second is replied, to ask further clarification or the supplement of user.
Fig. 7 shows the block diagram according to an embodiment of the present disclosure for collecting the device 700 of training data.Such as Fig. 7 institutes
Show, device 700 includes that first message obtains module 710, second message obtains module 720 and training data determining module 730.
First message obtains module 710 and is configured as obtaining first message from the user.Second message obtains module 720 and is configured as
The second message from the user for being directed to the first reply is obtained, wherein the first reply is generated based on first message, and the
One message and second message are natural language form.Training data determining module 730 is configured to respond to determine that second disappears
Breath is semantically clarifying first message, and first message is determined as the training data for training chat conversations.
In some embodiments, wherein training data determining module 730 includes:Markup information determining module, is configured as
The semanteme of first message is clarified based on second message, determines the markup information for first message;And association determining module,
It is configured as first message and markup information being determined as training data in association.
In some embodiments, wherein markup information determining module include be intended to determining module, it is intended that determining module by with
It is set to the intention type and word slot information of determining first message, wherein word slot information includes word slot type and word slot value.
In some embodiments, device 700 further includes that declarative sentence provides module, is configured as providing a user first time
Multiple, first replys as declarative sentence form and does not accurately identify the intention of user;And/or confirmative question provides module, is configured as
The first reply is provided a user, first replys as confirmative question form.
In some embodiments, device 700 further includes the first generation module, and it includes using to be configured to respond to second message
In the additional information for explaining first message, replied for the second of second message based on additional information to generate;And first carry
For module, it is configured as providing a user the second reply.
In some embodiments, device 700 further includes:Second generation module, being configured to respond to second message includes
Correction to identification mistake or typing error in first message, is replied to generate for the second of second message based on correcting;
And second provide module, be configured as providing a user the second reply.
In some embodiments, device 700 further includes:Third generation module, being configured to respond to second message includes
The answer of rhetorical question in replying first is replied to generate for the second of second message based on answering;And third provides mould
Block is configured as providing a user the second reply.
In some embodiments, wherein third generation module includes the 4th generation module, and the 4th generation module is configured as
The selection of multiple candidate results in being replied based on second message pair first is generated second and replied.
It should be appreciated that the first message shown in Fig. 7 obtains module 710, second message obtains module 720 and training
Data determining module 730 can be included in the dimensioning machine people 130 with reference to described in figure 1 or in chat robots 120 (
In the case that chat robots 120 have the function of mark).Furthermore, it is to be understood that the module shown in Fig. 7 can execute
With reference to embodiment of the disclosure method or in the process the step of or action.
Fig. 8 shows the schematic block diagram for the example apparatus 800 that can be used for implementing embodiment of the disclosure.It should manage
Solution, equipment 800 can be used to implement the described device 700 for collecting training data of the disclosure, chat robots 120 or
Person's dimensioning machine people 130.As shown, equipment 800 includes central processing unit (CPU) 801, it can be read-only according to being stored in
Computer program instructions in memory (ROM) 802 are loaded into random access storage device (RAM) 803 from storage unit 808
In computer program instructions, to execute various actions appropriate and processing.It, can also the operation of storage device 800 in RAM 803
Required various programs and data.CPU 801, ROM 802 and RAM 803 are connected with each other by bus 804.Input/output
(I/O) interface 805 is also connected to bus 804.
Multiple components in equipment 800 are connected to I/O interfaces 805, including:Input unit 806, such as keyboard, mouse etc.;
Output unit 807, such as various types of displays, loud speaker etc.;Storage unit 808, such as disk, CD etc.;And it is logical
Believe unit 809, such as network interface card, modem, wireless communication transceiver etc..Communication unit 809 allows equipment 800 by such as
The computer network of internet and/or various telecommunication networks exchange information/data with other equipment.
Processing unit 801 executes each method and process as described above.For example, in some embodiments, method or
Process can be implemented as computer software programs, be tangibly embodied in machine readable media, such as storage unit 808.?
In some embodiments, some or all of of computer program can be loaded into via ROM 802 and/or communication unit 809
And/or it is installed in equipment 800.When computer program loads are to RAM 803 and executed by CPU 801, can execute above
One or more actions of the method or process of description or step.Alternatively, in other embodiments, CPU 801 can pass through
Other any modes (for example, by means of firmware) appropriate and be configured as execution method or process.
Function described herein can be executed by one or more hardware logic components at least partly.Example
Such as, without limitation, the hardware logic component for the exemplary type that can be used includes:Field programmable gate array (FPGA), specially
With integrated circuit (ASIC), Application Specific Standard Product (ASSP), the system (SOC) of system on chip, load programmable logic device
(CPLD), etc..
Any combinations that one or more programming languages may be used in program code for implementing disclosed method are come
It writes.These program codes can be supplied to the place of all-purpose computer, special purpose computer or other programmable data processing units
Manage device or controller so that program code makes defined in flowchart and or block diagram when by processor or controller execution
Function/operation is carried out.Program code can execute completely on machine, partly execute on machine, as stand alone software
Is executed on machine and partly execute or executed on remote machine or server completely on the remote machine to packet portion.
In the context of the disclosure, machine readable media can be tangible medium, can include or be stored for
The program that instruction execution system, device or equipment are used or is used in combination with instruction execution system, device or equipment.Machine can
It can be machine-readable signal medium or machine-readable storage medium to read medium.Machine readable media can include but is not limited to electricity
Son, magnetic, optical, electromagnetism, infrared or semiconductor system, device or equipment or the above any conjunction
Suitable combination.The more specific example of machine readable storage medium will include being electrically connected of line based on one or more, portable meter
Calculation machine disk, hard disk, random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM
Or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage facilities or
Any appropriate combination of the above.
Although in addition, depicting each action or step using certain order, this should be understood as that requirement acts in this way
Or step is executed with shown certain order or in sequential order, or require the action of all diagrams or step that should be performed
To obtain desired result.Under certain environment, it may be advantageous for multitask and parallel processing.Similarly, although above
Several specific implementation details are contained in discussion, but these are not construed as the limitation to the scope of the present disclosure.In list
Certain features described in the context of only embodiment can also be realized in combination in single realize.On the contrary, single
Various features described in the context of realization can also individually or in any suitable subcombination be realized multiple
In realization.
Although having used the implementation specific to the language description of the structure feature and/or method logical action disclosure
Example it should be appreciated that the theme defined in the appended claims is not necessarily limited to special characteristic described above or dynamic
Make.On the contrary, special characteristic described above and action are only to realize the exemplary forms of claims.
Claims (18)
1. a kind of method for collecting training data, including:
Obtain first message from the user;
Obtain from the user for first reply second message, it is described first reply based on the first message and by
It generates, and the first message and the second message are natural language form;And
The first message is semantically being clarified in response to the determination second message, the first message is being determined as instructing
Practice the training data of chat conversations.
2. according to the method described in claim 1, the first message be wherein determined as the training data including:
The semanteme of the first message is clarified based on the second message, determines the markup information for the first message;
And
The first message and the markup information are determined as the training data in association.
3. according to the method described in claim 2, wherein determining that the markup information for the first message includes:
Determine the intention type and word slot information of the first message, institute's predicate slot information includes word slot type and word slot value.
4. according to the method described in claim 1, further including at least one of following:
Described first is provided to the user to reply, described first replys as declarative sentence form and do not accurately identify the user
Intention;And
Described first is provided to the user to reply, described first replys as confirmative question form.
5. according to the method described in claim 1, further including:
Include the additional information for explaining the first message in response to the second message, is based on the additional information next life
It is replied at second for the second message;And
Described second is provided to the user to reply.
6. according to the method described in claim 1, further including:
Include the correction to identification mistake or typing error in the first message in response to the second message, based on described
It corrects and is replied for the second of the second message to generate;And
Described second is provided to the user to reply.
7. according to the method described in claim 1, further including:
Include the answer of the rhetorical question in replying described first in response to the second message, is directed to based on the answer to generate
The second of the second message is replied;And
Described second is provided to the user to reply.
8. according to the method described in claim 7, wherein further including to generate described second and reply based on the answer:
The selection of multiple candidate results in being replied described first based on the second message is generated described second and replied.
9. a kind of device for collecting training data, including:
First message obtains module, is configured as obtaining first message from the user;
Second message obtains module, is configured as obtaining the second message replied for first from the user, and described the
One reply is generated based on the first message, and the first message and the second message are natural language shape
Formula;And
Training data determining module is configured to respond to determine that the second message is semantically clarifying the first message,
The first message is determined as the training data for training chat conversations.
10. device according to claim 9, wherein the training data determining module includes:
Markup information determining module is configured as clarifying the semanteme of the first message based on the second message, determines needle
To the markup information of the first message;And
It is associated with determining module, is configured as the first message and the markup information being determined as the trained number in association
According to.
11. device according to claim 10, wherein the markup information determining module includes:
It is intended to determining module, is configured to determine that the intention type and word slot information of the first message, institute's predicate slot packet
Include word slot type and word slot value.
12. device according to claim 10 further includes at least one of following:
Declarative sentence provides module, is configured as providing first reply to the user, described first replys as declarative sentence shape
Formula and the intention for not accurately identifying the user;And
Confirmative question provides module, is configured as providing first reply to the user, described first replys as confirmative question shape
Formula.
13. device according to claim 10, further includes:
First generation module, it includes the additional letter for explaining the first message to be configured to respond to the second message
Breath is replied to generate for the second of the second message based on the additional information;And
First provides module, is configured as providing second reply to the user.
14. device according to claim 10, further includes:
Second generation module, be configured to respond to the second message include in the first message identification mistake or beat
The correction of character error is replied to generate for the second of the second message based on the correction;And
Second provides module, is configured as providing second reply to the user.
15. device according to claim 10, further includes:
Third generation module is configured to respond to the answer that the second message includes the rhetorical question in replying described first,
It is replied for the second of the second message based on the answer to generate;And
Third provides module, is configured as providing second reply to the user.
16. device according to claim 15, wherein the third generation module includes:
4th generation module is configured as the choosing of multiple candidate results in being replied described first based on the second message
It selects, generates described second and reply.
17. a kind of electronic equipment, the electronic equipment include:
One or more processors;And
Storage device, for storing one or more programs, one or more of programs are when by one or more of processing
Device executes so that the electronic equipment realizes the method according to any one of claim 1-8.
18. a kind of computer readable storage medium is stored thereon with computer program, is realized when described program is executed by processor
According to the method described in any one of claim 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810553778.3A CN108763548A (en) | 2018-05-31 | 2018-05-31 | Collect method, apparatus, equipment and the computer readable storage medium of training data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810553778.3A CN108763548A (en) | 2018-05-31 | 2018-05-31 | Collect method, apparatus, equipment and the computer readable storage medium of training data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108763548A true CN108763548A (en) | 2018-11-06 |
Family
ID=64001833
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810553778.3A Pending CN108763548A (en) | 2018-05-31 | 2018-05-31 | Collect method, apparatus, equipment and the computer readable storage medium of training data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108763548A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109639444A (en) * | 2019-02-20 | 2019-04-16 | 腾讯科技(深圳)有限公司 | Message treatment method, device, electronic equipment and storage medium |
CN110427461A (en) * | 2019-08-06 | 2019-11-08 | 腾讯科技(深圳)有限公司 | Intelligent answer information processing method, electronic equipment and computer readable storage medium |
CN110866100A (en) * | 2019-11-07 | 2020-03-06 | 北京声智科技有限公司 | Phonetics generalization method and device and electronic equipment |
CN112015958A (en) * | 2019-05-31 | 2020-12-01 | 微软技术许可有限责任公司 | Content recommendation in automatic chat |
CN112199591A (en) * | 2020-10-10 | 2021-01-08 | 何波昌 | Air ticket checking and booking method, server and medium based on social software chat window |
CN114430378A (en) * | 2020-10-15 | 2022-05-03 | ***通信集团浙江有限公司 | Chat robot anomaly detection method and device, computing device and storage medium |
WO2022089546A1 (en) * | 2020-10-28 | 2022-05-05 | 华为云计算技术有限公司 | Label generation method and apparatus, and related device |
US11720634B2 (en) | 2021-03-09 | 2023-08-08 | International Business Machines Corporation | Automatic generation of clarification questions for conversational search |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105027197A (en) * | 2013-03-15 | 2015-11-04 | 苹果公司 | Training an at least partial voice command system |
US20170169115A1 (en) * | 2015-12-09 | 2017-06-15 | Industrial Technology Research Institute | Internet question answering system and method, and computer readable recording media |
CN107463601A (en) * | 2017-06-13 | 2017-12-12 | 北京百度网讯科技有限公司 | Dialogue based on artificial intelligence understands system constituting method, device, equipment and computer-readable recording medium |
-
2018
- 2018-05-31 CN CN201810553778.3A patent/CN108763548A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105027197A (en) * | 2013-03-15 | 2015-11-04 | 苹果公司 | Training an at least partial voice command system |
US20170169115A1 (en) * | 2015-12-09 | 2017-06-15 | Industrial Technology Research Institute | Internet question answering system and method, and computer readable recording media |
CN107463601A (en) * | 2017-06-13 | 2017-12-12 | 北京百度网讯科技有限公司 | Dialogue based on artificial intelligence understands system constituting method, device, equipment and computer-readable recording medium |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109639444A (en) * | 2019-02-20 | 2019-04-16 | 腾讯科技(深圳)有限公司 | Message treatment method, device, electronic equipment and storage medium |
CN112015958A (en) * | 2019-05-31 | 2020-12-01 | 微软技术许可有限责任公司 | Content recommendation in automatic chat |
CN110427461A (en) * | 2019-08-06 | 2019-11-08 | 腾讯科技(深圳)有限公司 | Intelligent answer information processing method, electronic equipment and computer readable storage medium |
CN110427461B (en) * | 2019-08-06 | 2023-04-07 | 腾讯科技(深圳)有限公司 | Intelligent question and answer information processing method, electronic equipment and computer readable storage medium |
CN110866100A (en) * | 2019-11-07 | 2020-03-06 | 北京声智科技有限公司 | Phonetics generalization method and device and electronic equipment |
CN110866100B (en) * | 2019-11-07 | 2022-08-23 | 北京声智科技有限公司 | Phonetics generalization method and device and electronic equipment |
CN112199591A (en) * | 2020-10-10 | 2021-01-08 | 何波昌 | Air ticket checking and booking method, server and medium based on social software chat window |
CN112199591B (en) * | 2020-10-10 | 2023-09-26 | 何波昌 | Air ticket checking and booking method, server and medium based on social software chat window |
CN114430378A (en) * | 2020-10-15 | 2022-05-03 | ***通信集团浙江有限公司 | Chat robot anomaly detection method and device, computing device and storage medium |
CN114430378B (en) * | 2020-10-15 | 2023-08-18 | ***通信集团浙江有限公司 | Anomaly detection method and device for chat robot, computing device and storage medium |
WO2022089546A1 (en) * | 2020-10-28 | 2022-05-05 | 华为云计算技术有限公司 | Label generation method and apparatus, and related device |
US11720634B2 (en) | 2021-03-09 | 2023-08-08 | International Business Machines Corporation | Automatic generation of clarification questions for conversational search |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108763548A (en) | Collect method, apparatus, equipment and the computer readable storage medium of training data | |
US10776580B2 (en) | Method for providing dialogue service with chatbot assisted by human agents | |
US10303758B2 (en) | Systems methods and computer-readable storage media for real-time automated conversational agent | |
WO2018224034A1 (en) | Intelligent question answering method, server, terminal and storage medium | |
JP6604836B2 (en) | Dialog text summarization apparatus and method | |
CN109002501A (en) | For handling method, apparatus, electronic equipment and the computer readable storage medium of natural language dialogue | |
US20170124064A1 (en) | Reply information recommendation method and apparatus | |
CN109101545A (en) | Natural language processing method, apparatus, equipment and medium based on human-computer interaction | |
CN109325091B (en) | Method, device, equipment and medium for updating attribute information of interest points | |
CN107818798A (en) | Customer service quality evaluating method, device, equipment and storage medium | |
CN111177359A (en) | Multi-turn dialogue method and device | |
CN108447471A (en) | Audio recognition method and speech recognition equipment | |
CN107430616A (en) | The interactive mode of speech polling re-forms | |
US10950223B2 (en) | System and method for analyzing partial utterances | |
CN110610705A (en) | Voice interaction prompter based on artificial intelligence | |
CN108877792A (en) | For handling method, apparatus, electronic equipment and the computer readable storage medium of voice dialogue | |
CN116737908A (en) | Knowledge question-answering method, device, equipment and storage medium | |
CN110288995A (en) | Exchange method, device, storage medium and electronic equipment based on speech recognition | |
CN113140138A (en) | Interactive teaching method, device, storage medium and electronic equipment | |
CN108897771B (en) | Automatic question answering method and device, computer readable storage medium and electronic equipment | |
CN109739969A (en) | Answer generation method and intelligent conversational system | |
CN114490975A (en) | User question labeling method and device | |
CN113111658B (en) | Method, device, equipment and storage medium for checking information | |
WO2020144636A1 (en) | Artificial intelligence system for business processes | |
CN110047473A (en) | A kind of man-machine collaboration exchange method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181106 |
|
RJ01 | Rejection of invention patent application after publication |