CN111292752B

CN111292752B - User intention recognition method and device, electronic equipment and storage medium

Info

Publication number: CN111292752B
Application number: CN201811490105.4A
Authority: CN
Inventors: 罗文娟; 李奘
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2018-12-06
Filing date: 2018-12-06
Publication date: 2023-05-12
Anticipated expiration: 2038-12-06
Also published as: CN111292752A

Abstract

The present application relates to the field of data processing technologies, and in particular, to a method and apparatus for identifying user intention, an electronic device, and a storage medium, where the method includes: acquiring a target voice recognition text; extracting text feature vectors from the target voice recognition text; and inputting the extracted text feature vector into a pre-trained user intention recognition model, and determining a user intention category corresponding to the target voice recognition text and intention information under the user intention category. By adopting the scheme, the user intention is identified through the trained user intention identification model, so that the problem of high labor cost caused by a manual labeling mode is avoided, the labor cost is reduced, and the identification accuracy is ensured.

Description

User intention recognition method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a user intention recognition method, a device, an electronic apparatus, and a storage medium.

Background

With the development of artificial intelligence, speech recognition, natural language processing, and spoken language understanding (Speech Language Understanding, SLU) technologies, application-based chat robots have grown. The application type chat robot is mainly aimed at specific tasks, obtains information required for completing the tasks through multiple rounds of voice interaction with users, and finally completes the tasks appointed by the users. For example, for the task of assisting a user in booking a ticket, the robot needs to recognize the user's booking intention from the user's voice input text.

The user intention recognition method in the related art generally needs to rely on a large amount of intention labeling data, that is, manually labeling the acquired voice input text of the user, and training an intention recognition model based on the voice input text after the intention labeling to realize the user intention recognition.

However, the related method for identifying the intention of the user by adopting the manual labeling mode needs to consume a great deal of labor cost.

Disclosure of Invention

In view of the foregoing, an object of an embodiment of the present application is to provide a method, an apparatus, an electronic device, and a storage medium for identifying user intention, so as to ensure accuracy of identification while reducing labor cost.

Mainly comprises the following aspects:

in a first aspect, an embodiment of the present application provides a method for identifying a user intention, where the method includes:

acquiring a target voice recognition text;

extracting text feature vectors from the target voice recognition text;

and inputting the extracted text feature vector into a pre-trained user intention recognition model, and determining a user intention category corresponding to the target voice recognition text and intention information under the user intention category.

In one embodiment, the target speech recognition text is determined according to the steps of:

responding to the acquired user voice, and determining a reference voice recognition text corresponding to the user voice;

and inputting the determined reference voice recognition text into a voice search engine to obtain a target voice recognition text corresponding to the reference voice recognition text.

In another embodiment, extracting text feature vectors from the target speech recognition text includes:

sequentially dividing the target voice recognition texts to obtain a plurality of target voice recognition sub-texts;

sequentially inputting each target voice recognition sub-text into a pre-trained vector conversion model to obtain text feature vectors corresponding to each target voice recognition sub-text;

and combining the text feature vectors corresponding to all the target voice recognition sub-texts into the text feature vectors of the target voice recognition texts.

In yet another embodiment, the user intent recognition model is trained according to the following steps:

acquiring a sample voice recognition text set;

extracting text feature vectors of the sample voice recognition texts aiming at each sample voice recognition text in the sample voice recognition text set, and determining category feature vectors of user intention categories corresponding to the sample voice recognition texts and intention information under the user intention categories;

And taking the text feature vector of the sample voice recognition text as input of a user intention recognition model to be trained, taking the category feature vector of the user intention category corresponding to the sample voice recognition text and the intention information under the user intention category as output of the user intention recognition model to be trained, and training to obtain the user intention recognition model.

In some embodiments, after obtaining the set of sample speech recognition texts, before determining the category feature vector of the user intent category corresponding to the sample speech recognition text, further comprising:

determining a user intention category corresponding to each sample voice recognition text in the sample voice recognition text set;

determining a subset of sample speech recognition text corresponding to each user intent category, the subset of sample speech recognition text comprising at least one sample speech recognition text;

for each sample voice recognition text subset, determining a category feature vector of a user intention category corresponding to the sample voice recognition text subset according to text feature vectors of at least one sample voice recognition text included in the sample voice recognition text subset;

Determining a category feature vector for a user intent category corresponding to the sample speech recognition text, comprising:

and taking the determined category characteristic vector of the user intention category corresponding to the sample voice recognition text subset as the category characteristic vector of the user intention category corresponding to any sample voice recognition text included in the sample voice recognition text subset.

searching the sample voice recognition texts according to a preset text searching strategy aiming at any sample voice recognition text to obtain an expanded sample voice recognition text subset, wherein the user intention category corresponding to the sample voice recognition text in the expanded sample voice recognition text subset is the same as the user intention category corresponding to any sample voice recognition text;

determining a category characteristic vector of a user intention category corresponding to the expanded sample voice recognition text subset;

And taking the determined category characteristic vector of the user intention category corresponding to the expanded sample voice recognition text subset as the category characteristic vector of the user intention category corresponding to the sample voice recognition text.

In still another embodiment, the training to obtain the user intention recognition model includes:

determining initial category feature vectors for each user intent category;

for each sample speech recognition text in the sample speech recognition text set, determining attention information between the sample speech recognition text and each user intention category based on a text feature vector of the sample speech recognition text and an initial category feature vector of each user intention category;

and inputting text feature vectors of the sample voice recognition texts into a user intention recognition model to be trained aiming at each sample voice recognition text in the sample voice recognition text set, and performing at least one round of training until the difference vector between the category feature vector of the user intention category corresponding to the sample voice recognition text obtained by calculation through the determined attention degree information and the determined category feature vector of the user intention category corresponding to the sample voice recognition text meets the preset threshold requirement, stopping circulation, and training to obtain the user intention recognition model.

In some embodiments, inputting the text feature vector of the sample speech recognition text into a user intent recognition model to be trained, at least one round of training, comprises:

inputting text feature vectors of the sample voice recognition texts into a user intention recognition model to be trained aiming at each sample voice recognition text in the sample voice recognition text set, and calculating to obtain category feature vectors of a user intention category corresponding to the sample voice recognition text and intention information under the user intention category by using the determined attention degree information;

determining a difference vector between the calculated category feature vector and the determined category feature vector of the user intention category corresponding to the sample speech recognition text, and adjusting the attention information based on the difference vector;

and circularly executing the text feature vector of the sample voice recognition text to be input into a user intention recognition model to be trained, calculating the category feature vector of the user intention category corresponding to the sample voice recognition text and the corresponding intention information by using the adjusted attention degree information until the circulation is stopped and the user intention recognition model is trained when the determined difference value vector meets the preset threshold requirement.

In a second aspect, embodiments of the present application further provide a user intention recognition apparatus, where the apparatus includes:

the target acquisition module is used for acquiring target voice recognition texts;

the vector extraction module is used for extracting text feature vectors from the target voice recognition text;

the intention recognition module is used for inputting the extracted text feature vector into a pre-trained user intention recognition model and determining a user intention category corresponding to the target voice recognition text and intention information under the user intention category.

In one embodiment, the target acquisition module is specifically configured to:

In another embodiment, the vector extraction module is specifically configured to:

In yet another embodiment, the system further comprises a model training module comprising:

the sample acquisition unit is used for acquiring a sample voice recognition text set;

a category determining unit configured to extract, for each sample speech recognition text in the set of sample speech recognition texts, a text feature vector of the sample speech recognition text, and determine a category feature vector of a user intention category corresponding to the sample speech recognition text and intention information under the user intention category;

the model training unit is used for taking the text feature vector of the sample voice recognition text as input of a user intention recognition model to be trained, taking the category feature vector of the user intention category corresponding to the sample voice recognition text and the intention information under the user intention category as output of the user intention recognition model to be trained, and training to obtain the user intention recognition model.

In some embodiments, the category determining unit is specifically configured to:

In yet another embodiment, the model training unit is specifically configured to:

determining initial category feature vectors for each user intent category;

In some embodiments, the model training unit is specifically configured to:

In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the user intent recognition method as described in the first aspect when executed.

In a fourth aspect, embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the user intention recognition method according to the first aspect.

By adopting the scheme, the text feature vector of the sample voice recognition text, the category feature vector of the user intention category corresponding to the sample voice recognition text and the intention information under the user intention category are used as the input and the output of the user intention recognition model, and training of the user intention recognition model is performed in advance, so that the user intention corresponding to the target voice recognition text can be automatically recognized based on the user intention recognition model trained in advance. That is, the embodiment of the application identifies the user intention through the trained user intention identification model, so that the problem of high labor cost caused by a manual labeling mode is avoided, and the accuracy of identification is ensured while the labor cost is reduced.

In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a user intent recognition method provided in accordance with an embodiment of the present application;

fig. 2 shows an application schematic diagram of a user intention recognition method according to an embodiment of the present application;

FIG. 3 is a flowchart of a method for identifying user intention according to a second embodiment of the present application;

FIG. 4 is a flowchart of another method for identifying user intention according to the second embodiment of the present application;

fig. 5 is a schematic structural diagram of a user intention recognition device according to a fourth embodiment of the present application;

fig. 6 shows a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the accompanying drawings in the present application are only for the purpose of illustration and description, and are not intended to limit the protection scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this application, illustrates operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to the flow diagrams and one or more operations may be removed from the flow diagrams as directed by those skilled in the art.

In addition, the described embodiments are only some, but not all, of the embodiments of the present application. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.

Considering the related method for realizing user intention recognition by adopting a manual labeling mode, a great deal of manpower is required to be consumed. In view of this, the embodiment of the application provides a user intention recognition method, which can be applied to the technical field of music search, the technical field of network ticket booking, and other technical fields. The following is a detailed description of several embodiments.

Example 1

As shown in fig. 1, a flowchart of a user intention recognition method provided in the first embodiment of the present application may be an electronic device, such as a private device, an operator network device (e.g., a base Station device), a site (Station) deployed by a certain industry organization, a group, and a person, and specifically, may include, but not be limited to, a mobile Station, a mobile terminal, a mobile phone, a user device, a mobile phone, a portable device (portable equipment), a vehicle (vehicle), and the like, where the terminal device may be, for example, a mobile phone (or referred to as a "cellular" phone), a computer with a wireless communication function, and the like, and the terminal device may also be a portable, a pocket-sized, a handheld, a computer-built-in, or a vehicle-mounted mobile device. The user intention recognition method specifically comprises the following steps:

S101, acquiring target voice recognition text.

Here, the target speech recognition text may be text obtained by speech recognition of the user's speech. In this embodiment of the present application, a speech recognition system may be used to perform speech recognition on a user's speech, that is, the user's speech may be analyzed and processed by a feature parameter extraction unit, redundant information in the rich speech information may be removed, information useful for speech recognition may be obtained, and then the obtained information may be recognized by a pattern matching and model training unit to obtain a speech recognition text.

In consideration of the fact that the existing voice recognition system is sensitive to the environment, targeted voice training is usually required for one scene, and various noise is usually included in the voice recognition text obtained through recognition, therefore, after the reference voice recognition text is obtained through voice recognition by the voice recognition system, the embodiment of the application can calibrate the reference voice recognition text obtained through recognition based on a relevant voice search engine so as to obtain the target voice recognition text.

The embodiment of the application can determine the target voice recognition text corresponding to the reference voice recognition text based on the text matching degree between the reference voice recognition text obtained by voice recognition and each target voice recognition text in a database adopted by the voice search engine, so that even if noise interference exists in the voice recognition stage, the embodiment of the application can still perform accurate voice search based on the matching degree. If the quarts of Chen Mouxun are recognized as the quarts of Chen Mou arms in voice recognition, correct voice recognition text, that is, the quarts of Chen Mouxun can be fed back after voice search by the voice search engine.

S102, extracting text feature vectors from the target voice recognition text.

Here, after the target speech recognition text is acquired, the target speech recognition text, which is a natural language, may be converted into a text feature vector in digital form based on a digitizing method, such as word2vec, to facilitate machine recognition, which is called encoding (Encoder).

In consideration of the fact that the target speech recognition text is composed of a plurality of words, in the embodiment of the application, when text feature vectors are extracted, text division can be performed first, and then corresponding text feature vectors are determined based on division results and a pre-trained vector conversion model. In this embodiment of the present application, the target speech recognition text may be first sequentially divided to obtain a plurality of target speech recognition sub-texts, then each target speech recognition sub-text is sequentially input into a pre-trained vector conversion model to obtain text feature vectors corresponding to each target speech recognition sub-text, and finally the text feature vectors corresponding to all the target speech recognition sub-texts are combined to form the text feature vector of the target speech recognition text.

In the embodiment of the application, a vector conversion model of One-time representation (One-hot Representation) can be adopted, a vector conversion model of distributed representation (Distributed Representation) can be adopted, and other vector conversion models capable of converting text into vectors can be adopted. The former vector conversion model can use a long vector to represent a word, the vector length is the word size N of a dictionary, each vector has only one dimension of 1, the rest dimensions are all 0, and the position of 1 represents the position of the word in the dictionary. That is, the former vector conversion model stores word information in a sparse manner, that is, each word is assigned a digital identifier, and the representation is relatively compact. The latter vector conversion model then requires semantic representation based on the context information, i.e. words that appear in the same context, the semantics of which are similar. That is, the latter vector conversion model stores word information in a dense manner, and the representation is relatively complex. Considering that the former vector conversion model based on One-hot Representation often encounters dimension disasters when solving practical problems and potential relations between vocabularies cannot be revealed, the latter vector conversion model based on Distributed Representation can be adopted to carry out vector representation on target voice recognition texts in practical implementation, the dimension disasters are avoided, and associated attributes between vocabularies are mined, so that the accuracy of semantic expression is improved.

S103, inputting the extracted text feature vector into a pre-trained user intention recognition model, and determining a user intention category corresponding to the target voice recognition text and intention information under the user intention category.

Here, the user intention recognition model may be pre-trained, and the embodiment of the present application may use a neural network model as the user intention recognition model, and the model training stage is a process of training some unknown parameter information in the neural network model. Then, the user intention type corresponding to the target voice recognition text and the intention information under the user intention type can be recognized based on the user intention recognition model, and at this time, the text feature vector extracted from the target voice recognition text is only required to be input into the trained user intention recognition model.

In order to facilitate understanding of the user intention recognition method provided in the embodiment of the present application, a specific description is provided below in connection with fig. 2.

As shown in fig. 2, for the target speech recognition text "from beijing to shanghai", after the target speech recognition text is text-divided, four words of from beijing, to shanghai are obtained, each word being a target speech recognition sub-text. X1, X2, X3, X4 are text feature vectors corresponding to the four target speech recognition sub-texts described above. s, c ₁ ,c ₂ ,c ₃ ,c ₄ Namely, parameter information in model training, h ₁ ,h ₂ ,h ₃ ,h ₄ For representing the output of the relevant hidden layer during model training. Thus, after inputting X1, X2, X3, X4 as text feature vectors to the trained user intention recognition model, the user intention category corresponding to the target speech recognition text "from beijing to shanghai", that is, "flying", and the intention information "beijing" and "shanghai" under the user intention category "flying" will be obtained.

In the embodiment of the present application, the training process of the user intention recognition model is a key step of the user intention recognition method provided in the embodiment of the present application. Next, a training process of the user intention recognition model will be specifically described by the following second embodiment.

Example two

As shown in fig. 3, a flowchart of a method for training a user intention recognition model according to an embodiment of the present application is provided, where the method is specifically implemented by the following steps:

s301, acquiring a sample voice recognition text set;

s302, extracting text feature vectors of the sample voice recognition texts aiming at each sample voice recognition text in the sample voice recognition text set, and determining category feature vectors of user intention categories corresponding to the sample voice recognition texts and intention information under the user intention categories;

S303, taking the text feature vector of the sample voice recognition text as input of a user intention recognition model to be trained, taking the category feature vector of the user intention category corresponding to the sample voice recognition text and the intention information under the user intention category as output of the user intention recognition model to be trained, and training to obtain the user intention recognition model.

Here, in the training stage of the user intention recognition model, the text feature vector of the sample voice recognition text may be used as input of the user intention recognition model to be trained, the category feature vector of the user intention category corresponding to the sample voice recognition text and the intention information under the user intention category may be used as output, and the parameter information of the user intention recognition model may be obtained through training, that is, the trained user intention recognition model may be obtained.

In the embodiment of the application, the user intention recognition model can map an input text into an output category and intention information under the category. Embodiments of the present application may employ a bi-directional recurrent neural network (Recurrent Neural Networks, RNN) for model training. That is, the embodiment of the application adopts the bidirectional RNN network to gradually grasp various basic knowledge through repeated iterative learning, and finally learns how to obtain the user intention category corresponding to the sample speech recognition text and the intention information under the user intention category according to the sample speech recognition text.

Similarly to the target speech recognition text, each sample speech recognition text in the sample speech recognition text set may be obtained by recognizing and calibrating the speech of the sample user, so that it is ensured that any sample speech recognition text is correct. Correspondingly, the extraction process of the text feature vector in the sample speech recognition text is similar to the extraction process of the text feature vector in the target speech recognition text, and is not repeated here. Before training the user intention recognition model, the embodiment of the present application may confirm in advance the category feature vector of the user intention category corresponding to the sample speech recognition text and the intention information under the user intention category, so that training of the user intention recognition model may be performed based on the text feature vector of the sample speech recognition text, the category feature vector of the user intention category corresponding to the sample speech recognition text and the intention information under the user intention category, referring to a specific implementation method of model training shown in fig. 4, including the following steps:

s401, determining initial category feature vectors of each user intention category;

S402, determining attention degree information between each sample voice recognition text and each user intention type based on text feature vectors of the sample voice recognition text and initial type feature vectors of the user intention types aiming at each sample voice recognition text in the sample voice recognition text set;

s403, inputting text feature vectors of the sample voice recognition texts into a user intention recognition model to be trained for at least one round of training aiming at each sample voice recognition text in the sample voice recognition text set, and stopping circulating until a difference vector between the category feature vector of the user intention category corresponding to the sample voice recognition text obtained by calculation through the determined attention degree information and the determined category feature vector of the user intention category corresponding to the sample voice recognition text meets a preset threshold requirement, and training to obtain the user intention recognition model.

The user intention recognition method provided by the embodiment of the application can firstly determine the initial category feature vector of each user intention category, then determine the attention degree information between the sample voice recognition text and each user intention category based on the text feature vector of the sample voice recognition text and the initial category feature vector of each user intention category, finally input the text feature vector of the sample voice recognition text into the user intention recognition model to be trained, and perform at least one round of training of the user intention recognition model to train to obtain the user intention recognition model.

In the whole model training process, the embodiment of the application considers the influence of each user intention category on any sample voice recognition text, namely, for any sample voice recognition text, the more the user intention category corresponding to the sample voice recognition text is matched, the higher the corresponding attention is. Therefore, through a selective attention mechanism, when the sample voice recognition text is wrongly marked, the attention degree of the sample voice recognition text with the wrong mark type is effectively reduced, and the attention degree corresponding to the sample voice recognition text with the correct mark type is higher, so that the accuracy of user intention recognition can be further improved.

Wherein the degree of interest information between the sample speech recognition text and any user intent category may be determined based on the following formula:

wherein alpha is _i Attention information for representing between sample speech recognition text and any user intention category, e _i The similarity between the sample speech recognition text and any user intent category may be determined using the following formula:

e _i ＝x _i Ar (2)

wherein x is _i A text feature vector for representing sample speech recognition text, r for representing a category feature vector for a category of user intent, and a for representing a weighted diagonal matrix.

In summary, for any user intention category, when the similarity between the sample speech recognition text and the user intention category is greater, the attention of the sample speech recognition text to the user intention category is higher, the output result of the finally trained user intention recognition model is also more prone to the user intention category, intention information under the user intention category can be determined, and the accuracy of intention recognition is higher.

In the embodiment of the application, the user intention recognition model can be obtained through at least one round of training. That is, in the embodiment of the present application, for each sample speech recognition text in the set of sample speech recognition texts, a text feature vector of the sample speech recognition text may be first input into a user intention recognition model to be trained, a category feature vector of a user intention category corresponding to the sample speech recognition text and intention information under the user intention category are calculated using the determined attention degree information, then a difference vector between the calculated category feature vector and the determined category feature vector of the user intention category corresponding to the sample speech recognition text is determined, the attention degree information is adjusted based on the difference vector, finally, the text feature vector of the sample speech recognition text is input into the user intention recognition model to be trained in a circulating manner, the category feature vector of the user intention category corresponding to the sample speech recognition text and the corresponding intention information are calculated using the adjusted attention degree information, until the circulation is stopped and training is performed when the determined difference vector meets a preset threshold requirement, and the user intention recognition model is obtained.

Here, the attention degree information between the sample speech recognition text and each user intention category can be continuously adjusted through the obtained category feature vector output by the previous round of model, and the attention degree information can be considered as parameter information s in the trained user intention recognition model in an application scene for realizing the user intention recognition method provided by the embodiment of the application as shown in fig. 2, so that when the user intention category is determined, the adjusted attention degree information can be participated in model training for determining the user intention category with text granularity, and when intention information corresponding to the user intention category is determined, the adjusted attention degree information can be participated in model training for determining intention information under the user intention category with vocabulary granularity. Therefore, the user intention recognition model trained by the embodiment of the application not only can recognize the user intention category, but also can recognize the intention information under the specific user intention category, and the practicability is better.

In the embodiment of the present application, two main ways of determining the category feature vector of the user intention category corresponding to the sample speech recognition text are provided, and the following three embodiments are specifically referred to.

Example III

First aspect: for each sample voice recognition text in the sample voice recognition text set, the embodiment of the application may first determine a user intention category corresponding to the sample voice recognition text, then determine a sample voice recognition text subset corresponding to each user intention category, where the sample voice recognition text subset includes at least one sample voice recognition text, and for each sample voice recognition text subset, determine a category feature vector of the user intention category corresponding to the sample voice recognition text subset according to a text feature vector of the at least one sample voice recognition text included in the sample voice recognition text subset, and finally use the determined category feature vector of the user intention category corresponding to the sample voice recognition text subset as a category feature vector of the user intention category corresponding to any sample voice recognition text included in the sample voice recognition text subset.

On the other hand: for any sample voice recognition text, the embodiment of the application may search the sample voice recognition text according to a preset text search policy to obtain an expanded sample voice recognition text subset, wherein a user intention category corresponding to the sample voice recognition text in the expanded sample voice recognition text subset is the same as a user intention category corresponding to the any sample voice recognition text, then determine a category feature vector of the user intention category corresponding to the expanded sample voice recognition text subset, and finally use the determined category feature vector of the user intention category corresponding to the expanded sample voice recognition text subset as a category feature vector of the user intention category corresponding to the sample voice recognition text.

The sample extension described above is specifically described as follows:

for example, in the application field of music knowledge graph, the obtained sample speech recognition text is limited. When the sample voice recognition text is "I want to listen to the Xgle of ABC", the sample voice recognition text can be input into a text search engine, the obtained search result can be "X song of site version ABC", or can be a song of ABC, and the like, and the user category corresponding to the search result is the same as the user intention category of "I want to listen to the X song of ABC", so that the expansion of the sample voice recognition text is realized to a great extent, and the practicability is better.

Based on the above embodiments, the present application further provides a user intention recognition device, and the implementation of the following various devices may refer to the implementation of the above method, and the repetition is omitted.

Example IV

As shown in fig. 5, a user intention recognition device provided in a fourth embodiment of the present application includes:

a target obtaining module 501, configured to obtain a target speech recognition text;

a vector extraction module 502, configured to extract a text feature vector from the target speech recognition text;

the intention recognition module 503 is configured to input the extracted text feature vector into a pre-trained user intention recognition model, and determine a user intention category corresponding to the target speech recognition text and intention information under the user intention category.

In one embodiment, the target obtaining module 501 is specifically configured to:

and inputting the determined reference voice recognition text into a search engine to obtain a target voice recognition text corresponding to the reference voice recognition text.

In another embodiment, the vector extraction module 502 is specifically configured to:

In yet another embodiment, a model training module 504 is further included, the model training module 504 including:

determining initial category feature vectors for each user intent category;

In some embodiments, the model training unit is specifically configured to:

Example five

As shown in fig. 6, a schematic structural diagram of an electronic device according to a fifth embodiment of the present application includes: a processor 601, a storage medium 602, and a bus 603, the storage medium 602 storing machine-readable instructions executable by the processor 601, the process in communication with the storage medium 602 via the bus 603 when the electronic device is running, the machine-readable instructions when executed by the processor 601 perform the following process:

acquiring a target voice recognition text;

extracting text feature vectors from the target voice recognition text;

In one embodiment, in the process performed by the processor 601, the target speech recognition text is determined according to the following steps:

In another embodiment, in the processing performed by the processor 601, extracting a text feature vector from the target speech recognition text includes:

In yet another embodiment, in the process performed by the processor 601, the user intention recognition model is trained according to the following steps:

acquiring a sample voice recognition text set;

In some embodiments, in the processing performed by the processor 601, after obtaining the sample speech recognition text set, before determining the category feature vector of the user intention category corresponding to the sample speech recognition text, the processing further includes:

in the processing performed by the processor 601, determining a category feature vector of a user intention category corresponding to the sample speech recognition text includes:

In still another embodiment, in the processing performed by the processor 601, the training to obtain the user intention recognition model includes:

determining initial category feature vectors for each user intent category;

In some embodiments, in the processing performed by the processor 601, the text feature vector of the sample speech recognition text is input into the user intention recognition model to be trained, and at least one training is performed, including:

Example six

The sixth embodiment of the present application further provides a computer readable storage medium, where a computer program is stored, and the computer program is executed by a processor to perform the steps of the user intention recognition method corresponding to the foregoing embodiment.

Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when the computer program on the storage medium is run, the user intention recognition method can be executed, so that the problem that the cost of manpower for recognizing the user intention is high at present is solved, and the effect of ensuring the recognition accuracy rate while reducing the labor cost is achieved.

Based on the same technical concept, the embodiments of the present application further provide a computer program product, which includes a computer readable storage medium storing program code, where the program code includes instructions for executing the steps of the above-mentioned user intention recognition method, and specific implementation may refer to the above-mentioned method embodiments and will not be repeated herein.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the method embodiments, which are not described in detail in this application. In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, and the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, and for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other form.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes or substitutions are covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of user intent recognition, the method comprising:

acquiring a target voice recognition text;

extracting text feature vectors from the target voice recognition text;

inputting the extracted text feature vector into a pre-trained user intention recognition model, and determining a user intention category corresponding to the target voice recognition text and intention information under the user intention category; the user intention recognition model is trained based on text feature vectors of each sample voice recognition text and attention information between each sample voice recognition text and each user intention category; for each sample speech recognition text, the degree of interest information between the sample speech recognition text and each user intent category is determined based on a similarity between the sample speech recognition text and any user intent category;

The similarity between the sample speech recognition text and any user intent category is determined by the following formula:

e _i ＝x _i Ar；

2. The method of claim 1, wherein the target speech recognition text is determined according to the steps of:

3. The method of claim 1, wherein extracting text feature vectors from the target speech recognition text comprises:

4. The method of claim 1, wherein the user intent recognition model is trained according to the steps of:

acquiring a sample voice recognition text set;

5. The method of claim 4, wherein after obtaining the sample set of speech recognition texts, before determining the category feature vector for the user intent category corresponding to the sample set of speech recognition texts, further comprising:

6. The method of claim 4, wherein after obtaining the sample set of speech recognition texts, before determining the category feature vector for the user intent category corresponding to the sample set of speech recognition texts, further comprising:

7. The method according to claim 5 or 6, wherein the training to obtain the user intention recognition model using the text feature vector of the sample speech recognition text as the input of the user intention recognition model to be trained, using the category feature vector of the user intention category corresponding to the sample speech recognition text and the intention information under the user intention category as the output of the user intention recognition model to be trained, comprises:

determining initial category feature vectors for each user intent category;

for each sample speech recognition text in the sample speech recognition text set, determining attention information between the sample speech recognition text and each user intention category based on a text feature vector of the sample speech recognition text and an initial category feature vector of each user intention category; the degree of interest information between the sample speech recognition text and each user intent category is determined based on a similarity between the sample speech recognition text and any user intent category;

8. The method of claim 7, wherein inputting the text feature vector of the sample speech recognition text into the user intent recognition model to be trained, at least one round of training, comprises:

9. A user intent recognition device, the device comprising:

the intention recognition module is used for inputting the extracted text feature vector into a pre-trained user intention recognition model and determining a user intention category corresponding to the target voice recognition text and intention information under the user intention category; the user intention recognition model is trained based on text feature vectors of each sample voice recognition text and attention information between each sample voice recognition text and each user intention category; for each sample speech recognition text, the degree of interest information between the sample speech recognition text and each user intent category is determined based on a similarity between the sample speech recognition text and any user intent category;

e _i ＝x _i Ar；

10. The apparatus according to claim 9, wherein the target acquisition module is specifically configured to:

11. The apparatus according to claim 9, wherein the vector extraction module is specifically configured to:

12. The apparatus of claim 9, further comprising a model training module, the model training module comprising:

13. The apparatus according to claim 12, wherein the category determination unit is specifically configured to:

14. The apparatus according to claim 12, wherein the category determination unit is specifically configured to:

15. The apparatus according to claim 13 or 14, wherein the model training unit is specifically configured to:

determining initial category feature vectors for each user intent category;

16. The apparatus according to claim 15, wherein the model training unit is specifically configured to:

17. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the user intent recognition method as recited in any one of claims 1 to 8 when executed.

18. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the user intention recognition method as claimed in any one of claims 1 to 8.