CN114528851B

CN114528851B - Reply sentence determination method, reply sentence determination device, electronic equipment and storage medium

Info

Publication number: CN114528851B
Application number: CN202210148787.0A
Authority: CN
Inventors: 黄天来; 梁必志; 叶怡周
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-02-17
Filing date: 2022-02-17
Publication date: 2023-07-25
Anticipated expiration: 2042-02-17
Also published as: CN114528851A

Abstract

The application relates to the technical field of artificial intelligence, and particularly discloses a reply sentence determining method, a reply sentence determining device, electronic equipment and a storage medium, wherein the method is characterized by comprising the following steps: acquiring voice information of a user at the current moment, and analyzing the voice information to obtain text; extracting characteristics of the text to obtain characteristics X; obtaining the number of samples in a feature library; when the number of the samples is smaller than or equal to a first threshold value, sending voice information to an artificial seat, and receiving an intention analysis result of the artificial seat on the voice information to obtain an intention A; performing secondary confirmation processing to the user according to the intention A; when the secondary confirmation processing passes, combining the intention A and the feature X, storing the combined result as a sample into a feature library, and generating a reply sentence according to the intention A so as to reply to the user; when the secondary confirmation processing is not passed, the rejection information is generated and sent to the user.

Description

Reply sentence determination method, reply sentence determination device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a reply sentence determining method, a reply sentence determining device, electronic equipment and a storage medium.

Background

Along with the arrival of the intelligent age, the intelligent dialogue system is widely applied to the customer service field to replace the traditional manual agents to simply communicate with users, for example: scene navigation, service navigation, simple information, etc., to reduce the labor cost of the enterprise. In the existing intelligent dialogue system, only after the intelligent dialogue system acquires the accurate intention of the user, the business process corresponding to the intention of the user is started. For example: the intelligent dialogue system recognizes that the current intention of the user is 'automobile scratch paint repair and claim consultation' through the voice information of the user, and then the intelligent dialogue system matches a corresponding flow chart and dialogue library according to the intention, generates a reply sentence according to the intention of the current input voice of the user, starts dialogue service and helps customers to solve the problems. Therefore, the traditional intelligent dialogue system is very dependent on accurate recognition of the user intention, when the user intention cannot be recognized very accurately, the intelligent dialogue system falls into a situation that cannot be answered, and only can reject the user to request the user to explain the user again or hang up the user directly, so that the user experience is poor.

Therefore, the traditional solution is to accumulate a large number of dialogue samples first, train a model of the intelligent dialogue system, and improve the accuracy of the intention recognition of the voice input by the user. It is apparent that in this way, the greater the number of samples, the greater the accuracy of the resulting model.

In summary, conventional methods require a significant amount of time to accumulate raw data. However, for the new field, on one hand, since the business is newly opened up, a large amount of samples are not accumulated, and on the other hand, the intelligent dialogue system is needed to deal with a large number of users' consultations immediately, and no time is needed to collect samples. Therefore, the conventional method cannot be applied to the establishment of the intelligent dialogue system in the newly built field, and a need for an intelligent dialogue scheme that can be directly used and simultaneously perform sample accumulation under the condition of low sample and even zero sample is needed.

Disclosure of Invention

In order to solve the above-mentioned problems in the prior art, embodiments of the present application provide a reply sentence determining method, apparatus, electronic device, and storage medium, which can be directly used in the case of low samples or even zero samples, and simultaneously perform sample accumulation.

In a first aspect, an embodiment of the present application provides a reply sentence determining method including:

acquiring voice information of a user at the current moment, and analyzing the voice information to obtain text;

extracting characteristics of the text to obtain characteristics X;

obtaining the number of samples in a feature library;

When the number of the samples is smaller than or equal to a first threshold value, sending voice information to an artificial seat, and receiving an intention analysis result of the artificial seat on the voice information to obtain an intention A;

performing secondary confirmation processing to the user according to the intention A;

when the secondary confirmation processing passes, combining the intention A and the feature X, storing the combined result as a sample into a feature library, and generating a reply sentence according to the intention A so as to reply to the user;

when the secondary confirmation processing is not passed, the rejection information is generated and sent to the user.

In a second aspect, embodiments of the present application provide a reply sentence determining apparatus, including:

the analysis module is used for acquiring the voice information of the user at the current moment and analyzing the voice information to obtain a text;

the extraction module is used for extracting the characteristics of the text to obtain a characteristic X;

the processing module is used for acquiring the intention of the text and obtaining the intention A; performing secondary confirmation processing to the user according to the intention A; when the secondary confirmation processing passes, generating a reply sentence according to the intention A, and replying the user.

In a third aspect, embodiments of the present application provide an electronic device, including: and a processor coupled to the memory, the memory for storing a computer program, the processor for executing the computer program stored in the memory to cause the electronic device to perform the method as in the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program, the computer program causing a computer to perform the method as in the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program, the computer being operable to cause a computer to perform a method as in the first aspect.

The implementation of the embodiment of the application has the following beneficial effects:

it can be seen that in the present embodiment, when the sample support is absent in the feature library, the intention of the user is identified by the manual agent to ensure the accuracy of the intention identification in the case of a low sample or zero sample. Meanwhile, secondary confirmation is carried out on the client according to the intention recognized by the manual seat, the correctness of the intention is further improved, and the intention is combined with the characteristics of the user voice extracted by the user, and the user voice is stored as a sample. Therefore, under the condition of low samples or zero samples, the correct operation of the intelligent dialogue system can be ensured, repeated output refusal or hang-up due to insufficient accuracy of intention recognition is avoided, and the user experience is improved. Meanwhile, the accuracy of intention is ensured by a secondary confirmation mode, and the accumulation of correct samples is completed in the service process, so that the accuracy and the efficiency of subsequent training are further improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a reply sentence determining method according to an embodiment of the present application;

fig. 2 is a flow chart of a method for obtaining text by voice information at the current moment of a user according to an embodiment of the present application;

fig. 3 is a schematic flow chart of a method for obtaining a feature X by extracting features from a text according to an embodiment of the present application;

fig. 4 is a flowchart of a method for calculating similarity according to an embodiment of the present application;

FIG. 5 is a flowchart of a method for a user to perform a secondary confirmation of an intention A according to an embodiment of the present application;

fig. 6 is a schematic hardware structure of a reply sentence determining apparatus according to an embodiment of the present application;

fig. 7 is a functional block diagram of a reply sentence determining apparatus according to an embodiment of the present application;

Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, based on the embodiments herein, which would be apparent to one of ordinary skill in the art without undue burden are within the scope of the present application.

The terms "first," "second," "third," and "fourth" and the like in the description and in the claims of this application and in the drawings, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those skilled in the art will explicitly and implicitly understand that the embodiments described herein may be combined with other embodiments.

Firstly, it should be noted that the reply sentence determining method provided by the present application may be applicable to scenes such as remote intelligent voice customer service dialogue, off-line intelligent robot scene navigation, intelligent service handling machine service guidance, etc. In this embodiment, a remote intelligent voice customer service dialogue scene will be taken as an example, and the reply sentence determining method provided in the present application will be described, where reply sentence determining methods in other scenes are similar to those in the remote intelligent voice customer service dialogue scene, and will not be described herein.

Second, it should be noted that the embodiments disclosed herein may acquire and process related data based on artificial intelligence techniques. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Referring to fig. 1, fig. 1 is a flowchart of a reply sentence determining method according to an embodiment of the present application. The reply sentence determining method comprises the following steps:

101: and acquiring voice information of the user at the current moment, and analyzing the voice information to obtain text.

Specifically, after the user establishes communication with the intelligent customer service robot through the communication equipment, the intelligent customer service robot can receive voice information input by the user through the communication equipment, and then analyze the voice information to obtain text. In an alternative embodiment, for example, in an offline intelligent robot scene navigation scene, the intelligent robot may collect voice information of the user through a voice collection device such as a microphone installed on the intelligent robot. In other words, the method of capturing voice in real time in the art can be applied to the present embodiment, that is, the present embodiment is not limited to the method of capturing voice in real time.

Meanwhile, in this embodiment, a method for obtaining text by analyzing voice information of a user at a current moment is provided, as shown in fig. 2, the method includes:

201: and extracting the voice information in an audio mode to obtain the pinyin text.

In this embodiment, the voice information of the current moment of the user may be extracted in an audio manner to obtain corresponding audio features, and then the audio features are further analyzed and disassembled to obtain a corresponding pinyin text. By way of example, the voice information of the user at the current moment is "i want to handle the automobile scratch paint repair claim", and after the voice information is extracted through audio, the pinyin text can be obtained: "woxingyaobaliquicheguahenbuqilpei".

202: and dividing the pinyin text to obtain at least one sub pinyin.

In this embodiment, each of the at least one sub-pinyin is used to identify one syllable in the pronunciation. Along with the above example of "i want to handle car scratch paint repair claims", after the pinyin text "woxingyao yan liqichegua henbuqipei" is obtained, the pinyin text may be divided according to the pinyin composition rule to obtain at least one sub pinyin. Specifically, first, the initial consonants and the final consonants in the pinyin text are identified, then the pinyin text is split into single initial consonants and single final vowels, then the initial consonants and the final vowels are combined according to the pinyin composition rule, and at least one sub pinyin is obtained by combining the initial consonants and the final vowels. When the pinyin text is split, first initials and first finals are identified. In order to ensure that the first vowel is successfully identified, the last pinyin letter and the next pinyin letter of the first vowel are identified, and if the last pinyin letter and the next pinyin letter are both initials, the first pinyin is a part of the previous vowel. For example, "hanghang" identifies that the initial and final may be [ h, ang, h, ang ] or [ h, an, g, h, an, g ], identifies the third and fourth pinyin letters in [ h, an, g, h, an, g ], and finds that the initial is the same, and that there is no final after the third pinyin letter [ g ], so this splitting method is incorrect, and should be chosen in the first way. Thus, for the pinyin text "woxiangyaobanliqicheguahenbuqipei", after the processing of initial and final recognition, a character string can be obtained: [ w, o, x, i, ang, y, ao, b, an, l, i, q, i, ch, e, g, u, a, h, en, b, u, q, i, l, i, p, ei ]. And then, the character strings are identified and combined according to the pinyin composition rule, and the character strings are identified backwards from the first character. When the second initial is identified, the first few characters are combined into a sub-pinyin. And then carrying out subsequent recognition until the last character is recognized. After splitting and combining initials and finals of the character string in the example, at least one sub pinyin is [ wo, xia, yao, ban, li, qi, che, gua, hen, bu, qi, li, pei ] can be obtained.

203: and acquiring an application scene of the voice information of the user at the current moment.

In this embodiment, a customer service number dialed by a user when communicating with the intelligent customer service robot can be obtained, and the customer service number is matched with a preset customer service number classification table, so as to obtain an application scenario corresponding to the customer service number. For example, it may be preset that different customer service numbers in the customer service number table correspond to different application scenarios, for example: customer service number 10087 corresponds to a maintenance application scene; customer service number "10089" corresponds to an "insurance" application scenario. Based on this, the example of "i want to handle the automobile scratch paint repair claim" is used, and when the customer service number currently dialed by the user is "10089", the customer service number is matched with the preset telephone number classification table, and the application scene corresponding to the customer service number is determined to be an "insurance" scene.

204: and determining a preset word stock corresponding to the application scene according to the application scene of the voice information of the user at the current moment.

In the embodiment, the word stock is screened through the application scene, so that the number of candidate words can be reduced, and the confirmation efficiency of the text is improved. And the semantics of the candidate words are consistent with the application scene of the voice information, so that the accuracy of the subsequently generated text can be improved.

205: and matching in a preset word stock according to each sub pinyin to obtain at least one group of first words corresponding to at least one sub pinyin one by one.

In this embodiment, the sub-pinyin is compared with the pinyin of the selected word in the preset word stock. Recognizing the semantics of the sub pinyin, matching the semantics of the words in a preset word stock, and selecting the word with the largest matching degree as the first word corresponding to the sub pinyin. For example, the sub-pinyin is [ wo, xia, yao, ban, li, qi, che, gua, hen, bu, qi, li, pei ], of which "li" is selected for detailed explanation. Specifically, in the preset word library, for a group of first words corresponding to the sub pinyin "li", the group of first words may be: "reason", "benefit" or "lining".

206: and determining target words in the first word group corresponding to each sub-pinyin according to the sub-pinyin adjacent to each sub-pinyin, and obtaining at least one target word corresponding to at least one sub-pinyin one by one.

In this embodiment, the phrase may be obtained by combining two words adjacent to each other on the left and/or right of each sub pinyin. Matching is carried out according to the semantic meaning of the phrase and the application scene of the voice information of the current moment of the user, and target words which are most in line with the application scene are screened out from at least one group of first words. For example, in a preset word library, a group of first words corresponding to the sub pinyin "li" includes: the first words corresponding to the right adjacent sub-pinyin "pei" of the "theory", "benefit" or "lining" include: "claim," match, "or" accompany. Thus, by combining, a phrase can be obtained: "claims," reason, "" managing, "" claims, "" benefit, "" accompany, "" inner claim, "" inner match, "and" inner accompany. And then, matching the obtained phrase with the application scene insurance to obtain the highest matching degree of the phrase claim and the application scene, so that the first word theory is used as a target word corresponding to the sub pinyin li.

207: and arranging at least one target word according to the arrangement sequence of at least one sub pinyin in the pinyin text to obtain a text.

In this embodiment, after the target word corresponding to each sub pinyin is obtained through the filtering in step 206, the target words may be arranged according to the order of the sub pinyin corresponding to each target word in the pinyin text, so as to obtain the text. Specifically, the above example of "I want to handle car scratch paint claims" is followed. Through the series of the operations, target words corresponding to the sub pinyin [ wo, xia, yao, ban, li, qi, che, gua, hen, bu, qi, li, pei ] are respectively obtained: "me", "want", "do", "manage", "car", "scratch", "mark", "complement", "paint", "manage", "claim". Thus, the target words "I", "want", "do", "manage", "car", "scratch", "trace", "complement", "paint", "managing", "claim" are arranged according to the order of the sub-pinyin [ wo, xia, yao, ban, li, qi, che, gua, hen, bu, qi, li, pei ] in the pinyin text "woxingyaobanaliqicguhe henbuqipei", and the text "I want to handle the car scratch repair paint claims" can be obtained.

102: and extracting the characteristics of the text to obtain the characteristics X.

In this embodiment, a method for obtaining a feature X by extracting features from a text is provided, as shown in fig. 3, and the method includes:

301: and performing word splitting processing on the text to obtain at least one keyword.

In this embodiment, at least one candidate field may be obtained by identifying a split character in the text of the letter, and then replacing the identified split character with a space. Specifically, the split character may be set in advance, including but not limited to: verb words, noun words, punctuation marks, special symbols, etc. And then, carrying out forward maximum matching on each candidate field in the at least one candidate field and the universal word segmentation dictionary respectively, and taking the successfully matched word of the universal word segmentation dictionary as the candidate word corresponding to each candidate field. And finally, screening the obtained candidate words to obtain at least one keyword.

By way of example, several candidate words may be screened by comparing the semantics of the candidate words, in particular: dividing the text of 'I want to handle the automobile scratch paint claims', and obtaining the 'I want to handle the automobile scratch paint claims' according to the separator of verb words and noun words. The candidate word semantics are compared, and the keywords are known as 'car', 'scratch', 'paint repair', 'claim settlement' in combination with the application scene in which the user is currently positioned.

302: and calculating the association degree between any two different keywords in the at least one keyword to obtain at least one association degree.

In this embodiment, first, any two adjacent keywords among at least one keyword are combined to obtain a second word combination. And matching the second word combination with the application scene, scoring the matching degree of the second word combination and the application scene, determining that the two keywords are associated when the score is larger than a fifth threshold, and taking the score as the association degree. And combining the automobile with the scratch along the examples of which the keywords are the automobile, the scratch, the paint complement and the claim, obtaining a second word combination of the automobile scratch, matching the second word combination of the automobile scratch with an application scene of insurance, obtaining a score 95, and determining that the association of the two words is 95 and the association degree is greater than a fifth threshold.

303: and constructing a keyword graph according to the at least one relevancy and the at least one keyword.

In this embodiment, first, a full connection graph is created using a keyword as a vertex; and deleting the connecting lines with the association degree lower than a fourth threshold value in the full-connection graph according to the full-connection graph and the association degree, and generating a keyword map corresponding to the keywords.

304: and carrying out graph embedding processing on each keyword in at least one keyword according to the keyword graph to obtain at least one first graph vector, wherein the at least one first graph vector corresponds to the at least one keyword one by one.

In the present embodiment, first, a keyword is determined as a graph node. And constructing an isomorphic graph according to the nodes. And finally, calling a depth migration model to perform graph embedding processing on the co-occurrence relation structure graph, and outputting a first graph vector corresponding to the keyword.

305: and carrying out word embedding processing on each keyword to obtain at least one first word vector corresponding to at least one keyword one by one.

306: and calculating an average vector of each image quantity and the word vector corresponding to each image quantity for each image quantity in at least one image vector to obtain at least one first vector, wherein the at least one first vector corresponds to at least one keyword one by one.

In this embodiment, at least one first image vector corresponding to at least one keyword and at least one word vector are added, and then an average value is calculated to obtain an average vector, thereby obtaining at least one first vector corresponding to at least one keyword. Illustratively, as above, the word vector of the keyword "car" is (1, 2) and the map vector is (5, 6), the summed vector is (6, 8), and the average vector is (3, 4) is calculated as the first vector of the keyword "car".

307: and splicing the at least one first vector according to the sequence of the at least one keyword in the text of the word to obtain the feature X.

In this embodiment, following the above example of "i want to handle the automobile scratch paint repair claim", the keyword "automobile" corresponds to the first vector a, the keyword "scratch" corresponds to the first vector B, the keyword "paint repair" corresponds to the first vector C, the keyword "claim" corresponds to the first vector D, and the keywords "automobile scratch paint repair claim" are arranged in the order of the original text "automobile scratch paint repair claim", that is, "automobile", "scratch", "paint repair", "claim". And (3) longitudinally splicing the first vector from top to bottom to obtain a vector P serving as a characteristic X of the text 'automobile scratch paint repair claim'. Specifically, the vector P can be expressed by the formula (1):

103: and obtaining the number of samples in the sample in the feature library.

104: and when the number of the samples is smaller than or equal to a first threshold value, sending the voice information to the artificial seat, and receiving an intention analysis result of the artificial seat on the voice information to obtain an intention A.

In this embodiment, when the number of samples is less than or equal to the first threshold, it is indicated that there are not enough samples in the database to support the intelligent customer service robot to accurately identify the voice information of the user. Therefore, the voice information of the user at the current moment can be sent to the artificial agent, and the intention A corresponding to the voice information of the user can be determined through the recognition of the artificial agent.

Meanwhile, in the present embodiment, when the number of samples is greater than the first threshold, it is indicated that there is already a certain amount of sample data in the database as support, and the number of samples is insufficient to support training out an intention recognition model with sufficient accuracy, but the number of sample features corresponding to each intention is sufficient to support feature comparison. Based on the above, when the intention recognition is performed on the user voice information, similarity calculation processing can be performed on the feature X and each of N samples in the feature library, so that N similarities corresponding to the N samples one by one are obtained, wherein N is an integer greater than or equal to 1. And then determining the maximum similarity in the N similarities as target similarity, and taking the intention B corresponding to the target similarity as the intention A of the voice information of the current moment of the user when the target similarity is larger than a second threshold value.

For example, when the number of samples in the feature library reaches a certain number M (a first threshold), after extracting features from the voice information of the user at the current moment, the extracted features may be respectively mixed with the features collected in the feature libraryAnd calculating the row similarity to obtain the similarity S. Specifically, the feature library includes J intents, each intent corresponds to 500 samples, and j×500 similarities can be obtained through similarity calculation processing. At this time, the maximum similarity S can be found out from the J×500 similarities _max And the maximum similarity S _max And comparing with a preset second threshold value. When S is _max When the similarity is greater than the second threshold, the similarity S is maximized _max The intention of the corresponding sample is taken as the intention of the voice information of the current moment of the user.

Based on this, in the present embodiment, there is provided a method of calculating a similarity, as shown in fig. 4, the method including:

401: and calculating the product of the characteristic X and the characteristic vector corresponding to each sample to obtain a vector product F.

402: and calculating the product of the modulus of the feature X and the modulus of the feature vector corresponding to each sample to obtain the length product E of the modulus of the feature X and the feature vector corresponding to each sample.

403: and calculating the sum of the length product E and the constant C of the modulus of the feature X and the feature vector corresponding to each sample to obtain the length sum G.

In this embodiment, the constant C may be an integer greater than or equal to 1, and the constant C may be used to avoid that the length product of the modulus of the vector of the feature X and the modulus of the vector of the sample feature is 0, resulting in formula inefficiency.

404: the ratio of the vector product F and the length and G is obtained and taken as the similarity between the feature X and each sample.

Specifically, the similarity can be expressed by the formula (2):

where S is a similarity, a is a feature vector corresponding to each sample, b is a module of a vector of the feature X, i is a module of a feature vector corresponding to each sample, C is a custom parameter, and may be an integer greater than or equal to 1, in this embodiment, may be equal to 1.

Further, the modulo IaI of the vector of feature X can be expressed by equation (3):

wherein IaI is the modulus of the vector of feature X, V ₁ -V _d Is one element in the vector of feature X; further, the modulo b i of the feature vector corresponding to each sample can be expressed by the formula (4):

wherein b is the modulus of the feature vector corresponding to each sample, X ₁ -X _d One element in the feature vector corresponding to each sample;

in this embodiment, when the number of samples is greater than the third threshold, it is explained that the number of samples accumulated in the feature library is sufficient to support training of the model, resulting in an accurate intention recognition model. Therefore, at this stage, the accumulated samples in the sample library may be input into the initial model for training to obtain the classification model, and then the feature X may be input into the classification model to obtain the intention a. Wherein the third threshold is greater than the first threshold.

Specifically, in the present embodiment, the initial model may employ a natural language understanding (Natural Language Understanding, NLU) model. The NLU model is essentially a feature extraction network plus a classifier, features are extracted through the feature extraction network, then the features are input into the classifier for classification, the output labels are scoring values of all accurate intentions, the result which is highest in scoring and is greater than a preset threshold T is selected as the final result, and rejection (intention cannot be identified) is output if the result is smaller than the threshold. Assuming a total of N business scenarios, there are N intents (A ₁ ,A ₂ ,A ₃ ,…,A _e ) The data is collected in practice in such a way as to be used in the early stage when each intention is to be collectedWhen the collected features X reach a certain number M, such as more than 500 features are collected for each intention, a feature library is obtained (each intention takes a feature library formed by M features).

In this embodiment, after the data amount is acquired to a certain extent, that is, when the number of samples is greater than the third threshold, conventional NLU model training may be performed. The user intention is output through the NLU model, and the extracted feature X is additionally output. If the refusal is identified, the user is pushed to the seat personnel, the seat manually judges the giving intention A, and the data is stored to obtain a feature library of the bad case. In this embodiment, any number of NLU models can be started (M are randomly selected when the number is greater than M), so that the artificial seat pressure during the period of the next online NLU model training can be relieved.

105: it is determined whether the secondary confirmation processing performed to the user according to the intention a is passed, and when the secondary confirmation processing is passed, the process proceeds to step 106, and when the secondary confirmation processing is not passed, the process proceeds to step 107.

In the present embodiment, the secondary confirmation is performed for confirming to the user whether the recognized intention a is accurate or not before the subsequent service is performed, so as to ensure the recognition accuracy. In this embodiment, there is provided a method for letting a user perform a secondary confirmation process of an intention a, as shown in fig. 5, the method including:

501: a validation statement is generated according to intent a.

In the present embodiment, the intention a is "automobile scratch paint claim" can be converted into a similar question "whether your demand is an automobile scratch paint claim? As a confirmation statement, to confirm to the user whether the identified intention a is correct.

502: and sending a confirmation statement to the user and receiving feedback information of the user.

In this embodiment, a confirmation sentence is transmitted to the user, information fed back by the user is received by voice, voice fed back by the user is received, and the intention fed back by the user is recognized. In addition, in the scene under the line, the selection of yes and no can be displayed to the user through the display device, and the user can confirm the selection by himself.

503: when the feedback information is yes, it is determined that the secondary confirmation processing passes.

504: and when the feedback information is NO, judging that the secondary confirmation processing does not pass.

106: and combining the intention A with the feature X, storing the combined result as a sample into a feature library, and generating a reply sentence according to the intention A so as to reply to the user.

In this embodiment, by combining the accurate intention a obtained by the secondary confirmation of the user with the extracted feature X, an accurate sample is obtained to fill the feature library with the sample, and then the samples in the sample library are accumulated while the user works. And the resulting sample is essentially the intended correct forward sample due to the secondary validation relationship. Illustratively, the precision intent a is combined with the corresponding feature X in the form of (a, X) and stored as a sample in a feature library.

107: generating refusal information and sending the refusal information to the user.

In summary, in the method for determining reply sentences provided by the present application, when the number of samples in the feature library is less than or equal to the first threshold, the feature library lacks sample support, and the identified intent is output to the user by using a mode of combining the artificial agent and the feature extraction network. And the customer confirms the identified intention for the second time, and after obtaining the user feedback, the customer performs subsequent service. And when the user feedback is confirmation, the user voice information and the corresponding intention are put into a feature library together, so that data accumulation is realized. In the feature library, when the number of the samples is larger than the first threshold value and smaller than or equal to the third threshold value, the feature library has a certain sample basis, and the feature extraction network is still used for identifying the intention of the user voice information. At this stage, based on a certain sample basis in the feature library, similarity calculation is performed on the intention of the feature extraction network for identifying the voice information of the user and the existing sample features, and the intention corresponding to the maximum similarity is obtained. Meanwhile, secondary confirmation is needed, and subsequent service is performed after user feedback is obtained. And when the user feedback is confirmation, the user voice information and the corresponding intention are put into a feature library together, so that the data is accumulated again. And when the number of the samples in the feature library is larger than a third threshold value, performing conventional intention recognition and outputting intention and corresponding features. And carrying out secondary confirmation, and carrying out subsequent service after obtaining user feedback. And when the user feedback is confirmation, the user voice information and the corresponding intention are put into a feature library together, so that data accumulation is realized. Through the series of processes, the training process of the intelligent customer service robot from nothing to nothing is realized. Not only the pressure of the manual seat is slowed down, but also the problem of early cold start is well solved, the reliability of the intelligent customer service robot system is enhanced, the cost of enterprises is reduced, and the experience of incoming line clients is improved.

Referring to fig. 6, fig. 6 is a schematic hardware structure of a reply sentence determining apparatus according to an embodiment of the present application. The reply sentence determining apparatus 600 comprises at least one processor 601, a communication line 602, a memory 603 and at least one communication interface 604.

In this embodiment, the processor 601 may be a general purpose central processing unit (central processing unit, CPU), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs in the present application.

Communication line 602 may include a pathway to transfer information between the aforementioned components.

The communication interface 604 may be any transceiver-like device (e.g., antenna, etc.) for communicating with other devices or communication networks, such as ethernet, RAN, wireless local area network (wireless local area networks, WLAN), etc.

The memory 603 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), a compact disc (compact disc read-only memory) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

In this embodiment, the memory 603 may exist independently and is connected to the processor 601 through a communication line 602. The memory 603 may also be integrated with the processor 601. The memory 603 provided by embodiments of the present application may generally have non-volatility. The memory 603 is used for storing computer-executable instructions for executing the embodiments of the present application, and is controlled by the processor 601 to execute the instructions. The processor 601 is configured to execute computer-executable instructions stored in the memory 603 to implement the methods provided in the embodiments described below in this application.

In alternative embodiments, computer-executable instructions may also be referred to as application code, which is not specifically limited in this application.

In alternative embodiments, processor 601 may include one or more CPUs, such as CPU0 and CPU1 of FIG. 6.

In an alternative embodiment, the reply sentence determining apparatus 600 may include a plurality of processors, such as the processor 601 and the processor 607 in fig. 6. Each of these processors may be a single-core (single-CPU) processor or may be a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

In an alternative embodiment, if the reply sentence determining apparatus 600 is a server, for example, it may be a stand-alone server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platform. The reply sentence determining apparatus 600 may further include an output device 605 and an input device 606. The output device 605 communicates with the processor 601 and may display information in a variety of ways. For example, the output device 605 may be a liquid crystal display (liquid crystal display, LCD), a light emitting diode (light emitting diode, LED) display device, a Cathode Ray Tube (CRT) display device, or a projector (projector), or the like. The input device 606 is in communication with the processor 601 and may receive user input in a variety of ways. For example, the input device 606 may be a mouse, a keyboard, a touch screen device, a sensing device, or the like.

The reply sentence determining apparatus 600 may be a general-purpose device or a special-purpose device. The present embodiment does not limit the type of reply sentence determining apparatus 600.

Referring to fig. 7, fig. 7 is a functional block diagram of a reply sentence determining apparatus according to an embodiment of the present application. As shown in fig. 7, the reply sentence determination device includes:

the analysis module 701 is configured to obtain voice information of a user at a current moment, and analyze the voice information to obtain a text;

the extracting module 702 is configured to perform feature extraction on the text to obtain a feature X;

the processing module 703 is configured to obtain the number of samples in the feature library, send the voice information to the artificial seat when the number of samples is less than or equal to the first threshold, receive the result of the intention analysis of the voice information by the artificial seat, obtain the intention a, perform secondary confirmation processing on the user according to the intention a, and generate a reply sentence according to the intention a when the secondary confirmation processing passes, so as to reply to the user.

In the embodiment of the present invention, the processing module 703 is specifically configured to, in terms of performing the secondary confirmation processing on the user according to the intention a:

generating a confirmation statement according to the intention A;

sending a confirmation statement to a user and receiving feedback information of the user;

when the feedback information is yes, judging that the secondary confirmation processing is passed;

and when the feedback information is NO, judging that the secondary confirmation processing does not pass.

In the embodiment of the present invention, in analyzing the voice information of the current time of the user to obtain the text, the analysis module 701 is specifically configured to:

extracting voice information in an audio mode to obtain a pinyin text;

dividing the pinyin text to obtain at least one sub-pinyin, wherein each sub-pinyin in the at least one sub-pinyin is used for identifying one syllable in pronunciation;

acquiring an application scene of voice information of a user at the current moment;

determining a preset word stock corresponding to the application scene according to the application scene of the voice information of the user at the current moment;

matching in a preset word stock according to each sub pinyin to obtain at least one group of first words, wherein the at least one group of first words corresponds to the at least one sub pinyin one by one;

determining target words in the first word group corresponding to each sub-pinyin according to the sub-pinyin adjacent to each sub-pinyin to obtain at least one target word, wherein the at least one target word corresponds to the at least one sub-pinyin one by one;

and arranging at least one target word according to the arrangement sequence of at least one sub pinyin in the pinyin text to obtain a text.

In the embodiment of the present invention, in extracting features of a text to obtain features X, the extracting module 702 is specifically configured to:

Word splitting processing is carried out on the text of the characters to obtain at least one keyword;

calculating the association degree between any two different keywords in at least one keyword to obtain at least one association degree;

constructing a keyword graph according to at least one relevancy and at least one keyword;

performing graph embedding processing on each keyword in at least one keyword according to the keyword graph to obtain at least one first graph vector, wherein the at least one first graph vector corresponds to the at least one keyword one by one;

carrying out word embedding processing on each keyword to obtain at least one first word vector, wherein the at least one first word vector corresponds to the at least one keyword one by one;

for each image quantity in at least one image vector, calculating an average vector of each image quantity and a word vector corresponding to each image quantity to obtain at least one first vector, wherein the at least one first vector corresponds to at least one keyword one by one;

and splicing the at least one first vector according to the sequence of the at least one keyword in the text of the word to obtain the feature X.

In an embodiment of the present invention, when the number of samples is greater than the first threshold, the processing module 703 is specifically configured to:

Carrying out similarity calculation processing on the feature X and each sample in N samples in a feature library to obtain N similarities, wherein the N similarities are in one-to-one correspondence with the N samples, and N is an integer greater than or equal to 1;

determining target similarity among the N similarities, wherein the target similarity is the largest similarity among the N similarities;

and when the target similarity is larger than a second threshold, acquiring an intention B corresponding to the target similarity, and taking the intention B as an intention A.

In the embodiment of the present invention, in terms of calculating the similarity, the processing module 703 is specifically configured to:

calculating the product of the characteristic X and the characteristic vector corresponding to each sample to obtain a vector product F;

calculating the product of the modulus of the feature X and the modulus of the feature vector corresponding to each sample to obtain a length product E of the modulus of the feature X and the feature vector corresponding to each sample;

calculating the sum of a length product E and a constant C of a modulus of the feature X and a feature vector corresponding to each sample to obtain a length sum G, wherein the constant C is an integer greater than or equal to 1;

the ratio of the vector product F and the length and G is obtained and taken as the similarity between the feature X and each sample.

Specifically, the similarity between the feature X and the feature vector corresponding to each sample can be expressed by the formula (5):

/>

In an embodiment of the present invention, when the number of samples is greater than the third threshold, the processing module 703 is specifically configured to:

inputting samples in a sample library into an initial model for training to obtain a classification model, wherein a third threshold value is larger than a first threshold value;

and inputting the characteristic X into a classification model to obtain the intended A.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 8, the electronic device 800 includes a transceiver 801, a processor 802, and a memory 803. Which are connected by a bus 804. The memory 803 is used to store computer programs and data, and the data stored in the memory 803 can be transferred to the processor 802.

The processor 802 is configured to read a computer program in the memory 803 to perform the following operations:

Extracting characteristics of the text to obtain characteristics X;

obtaining the number of samples in a feature library;

In an embodiment of the present invention, the processor 802 is specifically configured to perform the following operations in terms of performing a secondary confirmation process to the user according to the intention a:

generating a confirmation statement according to the intention A;

In the embodiment of the present invention, in analyzing the voice information of the current moment of the user to obtain the text, the processor 802 is specifically configured to perform the following operations:

Extracting voice information in an audio mode to obtain a pinyin text;

In an embodiment of the present invention, in extracting features from the text of the letter to obtain the feature X, the processor 802 is specifically configured to perform the following operations:

In an embodiment of the present invention, when the number of samples is greater than the first threshold, the processor 802 is specifically configured to:

In an embodiment of the present invention, the processor 802 is specifically configured to perform the following operations in terms of calculating the similarity:

Specifically, the similarity between the feature X and the feature vector corresponding to each sample can be expressed by the formula (6):

In an embodiment of the present invention, when the number of samples is greater than the third threshold, the processor 802 is specifically configured to:

It should be understood that the reply sentence determining apparatus in the present application may include a smart Phone (such as an Android mobile Phone, iOS mobile Phone, windows Phone mobile Phone, etc.), a tablet computer, a palm computer, a notebook computer, a mobile internet device MID (Mobile Internet Devices, abbreviated as MID), a robot, a wearable device, etc. The reply sentence determining means described above are merely examples and are not exhaustive, including but not limited to the reply sentence determining means described above. In practical application, the reply sentence determining apparatus may further include: intelligent vehicle terminals, computer devices, etc.

From the above description of embodiments, it will be apparent to those skilled in the art that the present invention may be implemented in software in combination with a hardware platform. With such understanding, all or part of the technical solution of the present invention contributing to the background art may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the method of the various embodiments or portions of the embodiments of the present invention.

Accordingly, the present application also provides a computer-readable storage medium storing a computer program that is executed by a processor to implement some or all of the steps of any one of the reply sentence determination methods described in the above method embodiments. For example, the storage medium may include a hard disk, a floppy disk, an optical disk, a magnetic tape, a magnetic disk, a flash memory, etc.

The present application also provides a computer program product comprising a non-transitory computer-readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any one of the reply sentence determination methods described in the method embodiments above.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the acts and modules referred to are not necessarily required in the present application.

In the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, such as a division of units, merely a division of logic functions, and there may be additional divisions in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated units described above may be implemented either in hardware or in software program modules.

The integrated units, if implemented in the form of software program modules and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or all or part of the technical solution, in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, and the memory may include: flash disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

The foregoing has outlined rather broadly the more detailed description of the embodiments herein, and the detailed description of the principles and embodiments herein has been presented in terms of specific examples only to assist in the understanding of the methods and concepts of the present application; meanwhile, as those skilled in the art will vary in the specific embodiments and application scope according to the ideas of the present application, the contents of the present specification should not be construed as limiting the present application in summary.

Claims

1. A reply sentence determining method, the method comprising:

performing word splitting processing on the text to obtain at least one keyword;

Calculating the association degree between any two different keywords in the at least one keyword to obtain at least one association degree;

constructing a keyword graph according to the at least one relevance and the at least one keyword;

performing graph embedding processing on each keyword in the at least one keyword according to the keyword graph to obtain at least one first graph vector, wherein the at least one first graph vector corresponds to the at least one keyword one by one;

for each image quantity in the at least one first image vector, calculating an average vector of each image quantity and word vectors corresponding to each image quantity to obtain at least one first vector, wherein the at least one first vector corresponds to at least one keyword one by one;

splicing the at least one first vector according to the sequence of the at least one keyword in the text to obtain a feature X;

obtaining the number of samples in a feature library;

when the number of the samples is smaller than or equal to a first threshold value, sending the voice information to an artificial seat, and receiving an intention analysis result of the artificial seat on the voice information to obtain an intention A;

When the number of the samples is larger than the first threshold, calculating a product of the feature X and a feature vector corresponding to each sample in N samples in the feature library to obtain a vector product F;

obtaining the ratio of the vector product F to the length and G, and taking the ratio as the similarity between the characteristic X and each sample to obtain N similarities, wherein the N similarities are in one-to-one correspondence with the N samples, and N is an integer greater than or equal to 1;

when the target similarity is larger than a second threshold, acquiring an intention B corresponding to the target similarity, and taking the intention B as the intention A;

when the number of the samples is larger than a third threshold, inputting the samples in the feature library into an initial model for training to obtain a classification model, wherein the third threshold is larger than the first threshold;

Inputting the characteristic X into the classification model to obtain the intention A;

performing secondary confirmation processing on the user according to the intention A;

when the secondary confirmation processing passes, combining the intention A and the feature X, storing the combined result as a sample into the feature library, and generating a reply sentence according to the intention A so as to reply to the user;

and when the secondary confirmation processing is not passed, generating refusal information, and sending the refusal information to the user.

2. The method according to claim 1, wherein the performing a secondary confirmation process to the user according to the intention a includes:

generating a confirmation statement according to the intention A;

sending the confirmation statement to the user and receiving feedback information of the user;

when the feedback information is yes, judging that the secondary confirmation processing passes;

3. The method of claim 1, wherein obtaining the voice information of the user at the current time and analyzing the voice information to obtain the text comprises:

extracting the voice information in an audio mode to obtain a pinyin text;

Dividing the pinyin text to obtain at least one sub pinyin, wherein each sub pinyin in the at least one sub pinyin is used for marking one syllable in pronunciation;

acquiring an application scene of the voice information of the user at the current moment;

and arranging the at least one target word according to the arrangement sequence of the at least one sub pinyin in the pinyin text to obtain the text.

4. A reply sentence determining apparatus, characterized by comprising:

The extraction module is used for carrying out word splitting processing on the text to obtain at least one keyword, calculating the association degree between any two different keywords in the at least one keyword to obtain at least one association degree, constructing a keyword graph according to the at least one association degree and the at least one keyword, carrying out graph embedding processing on each keyword in the at least one keyword according to the keyword graph to obtain at least one first graph vector, carrying out word embedding processing on each keyword to obtain at least one first word vector, wherein the at least one first word vector corresponds to the at least one keyword one by one, calculating the average vector of word vectors corresponding to each image quantity and each image quantity in the at least one graph vector to obtain at least one first vector, and carrying out word stitching processing on the at least one first word in the at least one text according to the at least one word to obtain a character stitching sequence;

The processing module is configured to obtain a sample number of samples in a feature library, send the speech information to an artificial agent when the sample number is less than or equal to a first threshold, and receive an intention analysis result of the artificial agent on the speech information to obtain an intention a, calculate a product of the feature X and a feature vector corresponding to each sample in N samples in the feature library when the sample number is greater than or equal to the first threshold, obtain a vector product F, calculate a product of the feature X and a model of a feature vector corresponding to each sample, obtain a product of the feature X modulo the feature vector corresponding to each sample, calculate a sum of a length product E of the feature X modulo the feature vector corresponding to each sample and a constant C, obtain a length product G, wherein the constant C is an integer greater than or equal to 1, obtain a ratio of the vector product F and a length product G, take the ratio as a similarity between the feature X and each sample, obtain a similarity between N samples, calculate a similarity between the feature X and the N, calculate a similarity between the model and the sample, and the first threshold, and a similarity between the feature X is greater than or equal to a first threshold, and the similarity between the feature X is greater than or equal to the first threshold, and the first threshold is greater than or equal to the threshold, and obtain a similarity between the similarity is greater than or equal to the first threshold, and carrying out secondary confirmation processing on the user according to the intention A, and generating a reply sentence according to the intention A when the secondary confirmation processing passes, so as to reply the user.

5. An electronic device comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the processor, the one or more programs comprising instructions for performing the steps of the method of any of claims 1-3.

6. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, which is executed by a processor to implement the method of any of claims 1-3.