CN117807215B - Statement multi-intention recognition method, device and equipment based on model - Google Patents

Statement multi-intention recognition method, device and equipment based on model Download PDF

Info

Publication number
CN117807215B
CN117807215B CN202410232119.5A CN202410232119A CN117807215B CN 117807215 B CN117807215 B CN 117807215B CN 202410232119 A CN202410232119 A CN 202410232119A CN 117807215 B CN117807215 B CN 117807215B
Authority
CN
China
Prior art keywords
intention
sentence
text sentence
recognition
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410232119.5A
Other languages
Chinese (zh)
Other versions
CN117807215A (en
Inventor
邓邱伟
赵培
田云龙
彭强
姚一格
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Haier Technology Co Ltd
Qingdao Haier Intelligent Home Appliance Technology Co Ltd
Haier Uplus Intelligent Technology Beijing Co Ltd
Original Assignee
Qingdao Haier Technology Co Ltd
Qingdao Haier Intelligent Home Appliance Technology Co Ltd
Haier Uplus Intelligent Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Haier Technology Co Ltd, Qingdao Haier Intelligent Home Appliance Technology Co Ltd, Haier Uplus Intelligent Technology Beijing Co Ltd filed Critical Qingdao Haier Technology Co Ltd
Priority to CN202410232119.5A priority Critical patent/CN117807215B/en
Publication of CN117807215A publication Critical patent/CN117807215A/en
Application granted granted Critical
Publication of CN117807215B publication Critical patent/CN117807215B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Machine Translation (AREA)

Abstract

The application discloses a sentence multi-intention recognition method, a sentence multi-intention recognition device and sentence multi-intention recognition equipment based on a model, which can be applied to the technical field of voice recognition. And acquiring a voice interaction text sentence, and identifying whether a domain classification result of the voice interaction text sentence belongs to a target domain. When the method does not belong to the target field, the fact that the voice interaction text sentence possibly comprises a plurality of intentions is determined, the voice interaction text sentence is input into a multi-intention recognition GPT model to be processed, and a multi-intention recognition result is obtained. The multi-purpose recognition GPT model is trained based on a model supervision fine tuning mode, and/or is applied based on a multi-purpose recognition prompt word construction mechanism. When the multi-intention recognition result is determined to be a plurality of single-intention interactive text sentences, each single-intention interactive text sentence is converted into a corresponding standard single-intention interactive text sentence. Therefore, a plurality of intentions in the voice of the user can be accurately recognized, so that the interaction experience of the user and the dialogue system is more intelligent.

Description

Statement multi-intention recognition method, device and equipment based on model
Technical Field
The present application relates to the field of speech recognition technologies, and in particular, to a method, an apparatus, and a device for recognizing multiple intent of a sentence based on a model.
Background
With the rapid development of smart home, a user can control the smart terminal device in the smart home through voice. Currently, dialog systems in smart homes can only recognize a single intent in the user's voice, such as the intent of formulating an alarm clock. When the user voice includes a plurality of intents, the dialog system can only analyze one single intention most likely, and the intention analysis effect of the dialog system is poor.
Disclosure of Invention
In order to solve the technical problems, the application provides a sentence multi-intention recognition method, a sentence multi-intention recognition device and sentence multi-intention recognition equipment based on a model, which can recognize a plurality of intentions in user voices.
In order to achieve the above purpose, the technical scheme provided by the application is as follows:
in a first aspect, the present application provides a method for identifying multiple intent of a sentence based on a model, the method comprising:
Judging whether the domain classification result of the voice interaction text sentence obtained by voice recognition of the user voice belongs to the target domain or not;
When the field classification result of the voice interaction text sentence is judged not to belong to the target field, inputting the voice interaction text sentence into a multi-intention recognition GPT model, and obtaining a multi-intention recognition result output by the multi-intention recognition GPT model; the multi-purpose recognition GPT model is trained based on a model supervision fine tuning mode and/or is applied based on a multi-purpose recognition prompt word construction mechanism;
Judging whether the multi-intention recognition result is a plurality of single-intention interactive text sentences or not;
When the multi-intention recognition result is judged to be a plurality of single-intention interactive text sentences, each single-intention interactive text sentence is converted into a corresponding standard single-intention interactive text sentence; wherein each of the standard single-intent interactive text statements is used to represent one of the interactive intents in the user's speech.
In a second aspect, the present application provides a model-based sentence multiple intention recognition device, the device comprising:
a first judging unit for judging whether the domain classification result of the voice interaction text sentence obtained by voice recognition of the user belongs to the target domain;
The first acquisition unit is used for inputting the voice interaction text sentence into a multi-intention recognition GPT model when the field classification result of the voice interaction text sentence is judged not to belong to the target field, and acquiring a multi-intention recognition result output by the multi-intention recognition GPT model; the multi-purpose recognition GPT model is trained based on a model supervision fine tuning mode and/or is applied based on a multi-purpose recognition prompt word construction mechanism;
a second judging unit for judging whether the multi-intention recognition result is a plurality of single-intention interactive text sentences;
The first conversion unit is used for converting each single-intention interaction text sentence into a corresponding standard single-intention interaction text sentence when the multi-intention recognition result is judged to be a plurality of single-intention interaction text sentences; wherein each of the standard single-intent interactive text statements is used to represent one of the interactive intents in the user's speech.
In a third aspect, the present application provides an electronic device comprising:
One or more processors;
A storage device having one or more programs stored thereon,
The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the model-based statement multi-intent recognition method as described in the first aspect.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the model-based statement multi-intent recognition method as described in the first aspect.
According to the technical scheme, the application has the following beneficial effects:
The application provides a sentence multi-intention recognition method, a sentence multi-intention recognition device and sentence multi-intention recognition equipment based on a model. The dialogue system carries out voice recognition on the user voice and recognizes a voice interaction text sentence matched with the user voice. To determine whether multiple intents may be included in a voice interactive text sentence, a dialog system identifies whether a domain classification result of the voice interactive text sentence belongs to a target domain. When the domain classification result of the voice interaction text sentence is not recognized to belong to the target domain, determining that the voice interaction text sentence possibly comprises a plurality of intentions, and inputting the voice interaction text sentence into a multi-intention recognition GPT model for processing at the moment to obtain a multi-intention recognition result output by the multi-intention recognition GPT model. The multi-purpose recognition GPT model is trained based on a model supervision fine tuning mode, and/or is applied based on a multi-purpose recognition prompt word construction mechanism. Further, when it is determined that the multi-intent recognition result is a plurality of single-intent interactive text sentences, it is indicated that the voice interactive text sentences have been disassembled into a plurality of single intents. At this time, each single-intent interaction text sentence is converted into a corresponding standard single-intent interaction text sentence. The obtained plurality of standard single-intent interactive text sentences are used for representing the plurality of intentions in the user voice, and the standard single-intent interactive text sentences facilitate the subsequent processing of a dialog engine in a dialog system.
Through the method, whether a plurality of intentions exist in the user voice is firstly identified, when the plurality of intentions exist, the plurality of single intentions in the user voice are mined by utilizing the multi-intention identification GPT model, and the single-intention interactive text sentence representing each single intention is a standard single-intention interactive text sentence which is convenient for a subsequent dialog engine to process. Therefore, a plurality of intentions in the voice of the user can be accurately recognized, so that the interaction experience of the user and the dialogue system is more intelligent.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an implementation environment of a method for identifying multiple intent of a sentence based on a model according to an embodiment of the present application;
FIG. 2 is a flowchart of a method for identifying multiple intent of a sentence based on a model according to an embodiment of the present application;
FIG. 3 is a flowchart of another method for identifying multiple intent of a sentence based on a model according to an embodiment of the present application;
Fig. 4 is a schematic structural diagram of a device for identifying multiple intent of a sentence based on a model according to an embodiment of the present application.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of embodiments of the application will be rendered by reference to the appended drawings and appended drawings.
It is understood that the defects of the related technical schemes are all results obtained by the applicant after practice and careful study. Therefore, the discovery process of the problems of the related technical solutions and the solutions presented below for the problems of the related technical solutions by the embodiments of the present application should be all contributions of the applicant to the embodiments of the present application in the process of the present application.
In order to facilitate understanding of the model-based statement multi-intention recognition method provided by the embodiment of the application, the model-based statement multi-intention recognition method provided by the embodiment of the application is described below with reference to the accompanying drawings.
The sentence multi-intention recognition method based on the model provided by the embodiment of the application can be widely applied to full-house intelligent digital control application scenes such as intelligent Home (Smart Home), intelligent Home equipment ecology, intelligent Home (INTELLIGENCE HOUSE) ecology and the like. It should be appreciated that the above application scenarios are merely examples and are not limiting of application scenarios, and that these exemplary application scenarios may be implemented by the implementation environment depicted in fig. 1. Referring to fig. 1, the embodiment of the application provides an implementation environment schematic diagram of a sentence multi-intention recognition method based on a model. As shown in fig. 1, the implementation environment includes a terminal device 101 and a server 102. The server 102 is provided with a dialogue system including a dialogue engine. The dialogue system is used for processing the voice of the user and generating an instruction for controlling the terminal equipment. Furthermore, the dialogue system uploads the instruction to the cloud end, and sends the instruction to the dialogue engine through the cloud end. Or the dialog system directly sends the generated instructions to the dialog engine.
The terminal device 101 includes, but is not limited to, a PC, a mobile phone, a tablet computer, an intelligent air conditioner, an intelligent smoke machine, an intelligent refrigerator, an intelligent oven, an intelligent stove, an intelligent washing machine, an intelligent water heater, an intelligent washing device, an intelligent dish washer, an intelligent projection device, an intelligent television, an intelligent clothes hanger, an intelligent curtain, an intelligent video, an intelligent socket, an intelligent sound box, an intelligent fresh air device, an intelligent kitchen and toilet device, an intelligent bathroom device, an intelligent sweeping robot, an intelligent window cleaning robot, an intelligent mopping robot, an intelligent air purifying device, an intelligent steam box, an intelligent microwave oven, an intelligent kitchen appliance, an intelligent purifier, an intelligent water dispenser, an intelligent door lock and the like.
The server 102 is connected to the terminal apparatus 101 via a network. Server 102 may be used to provide services (e.g., application services, etc.) to terminal device 101 or to clients installed on terminal device 101. Also, a database may be provided on the server 102 or independent of the server 102, the database being used to provide data storage services for the server 102; cloud computing and/or edge computing services may be configured on server 102 or independent of server 102, for providing data manipulation services for server 102. It should be understood that the type of server 102 is not limited herein and may be determined according to actual circumstances.
Based on the above implementation environment, the terminal device 101 captures the user voice provided by the user, and then transmits the user voice to the dialogue system in the server 102. The dialogue system carries out voice recognition on the user voice and recognizes a voice interaction text sentence matched with the user voice. To determine whether multiple intents may be included in a voice interactive text sentence, a dialog system identifies whether a domain classification result of the voice interactive text sentence belongs to a target domain. When the domain classification result of the voice interaction text sentence is recognized not to belong to the target domain, the dialogue system determines that a plurality of intentions may be included in the voice interaction text sentence. At this time, the dialogue system inputs the voice interactive text sentence into the multi-intention recognition GPT model, and obtains the multi-intention recognition result output by the multi-intention recognition GPT model. The multi-purpose recognition GPT model is trained based on a model supervision fine tuning mode, and/or is applied based on a multi-purpose recognition prompt word construction mechanism.
Further, the dialogue system judges whether the multi-intention recognition result is a plurality of single-intention interactive text sentences, and when the multi-intention recognition result is a plurality of single-intention interactive text sentences, the dialogue system determines that the voice interactive text sentences have been disassembled into a plurality of single-intention. At this point, the dialog system converts each single-intent interactive text statement into a corresponding standard single-intent interactive text statement. The obtained plurality of standard single-intent interactive text sentences are used for representing the plurality of intentions in the user voice, and the standard single-intent interactive text sentences facilitate the subsequent processing of a dialog engine in a dialog system. Further, the dialog system can control the corresponding terminal device 101 based on the standard single-intent interactive text statement.
Those skilled in the art will appreciate that the frame diagram shown in fig. 1 is but one example in which embodiments of the present application may be implemented. The scope of applicability of the embodiments of the application is not limited in any way by the framework.
In order to facilitate understanding of the present application, a description is given below of a model-based statement multi-intent recognition method provided in an embodiment of the present application with reference to the accompanying drawings.
Referring to fig. 2, the flowchart of a method for identifying multiple intent of a sentence based on a model according to an embodiment of the present application may be applied to a server in the above embodiment, and in particular, may be applied to a dialogue system of the server. As shown in fig. 2, the method may include S201-S204:
s201: and judging whether the domain classification result of the voice interaction text sentence obtained by voice recognition of the user voice belongs to the target domain.
In an actual interaction scene, a user firstly sends out voice to a terminal device, the terminal device receives the voice of the user and sends the voice of the user to a server, and the server carries out multi-purpose recognition on the voice of the user.
Specifically, after receiving the user voice, the server performs voice recognition on the user voice to obtain a voice interaction text sentence matched with the user voice, namely, a sentence in a text form. At this time, the multi-intention recognition of the user's voice is converted into sentence multi-intention recognition of the voice interactive text sentence matched with the recognition, and then the subsequent processing is performed based on the voice interactive text sentence.
In order to realize multi-intention recognition of the sentences of the voice interactive text sentences, firstly judging whether the domain classification result of the voice interactive text sentences belongs to the target domain, determining whether the voice interactive text sentences are likely to be sentences comprising multi-intention according to the judgment result, and judging whether the multi-intention recognition of the voice interactive text sentences is needed. When the multi-intention is possibly included in the voice interaction text sentence, multi-intention recognition is performed, so that the multi-intention recognition efficiency can be improved.
Wherein the target field includes a device control field or a life skill field. "device control" in the field of device control may be understood as a control item for a smart device. The voice interaction text sentence in the equipment control field mainly comprises control items of intelligent terminal equipment such as a refrigerator, an air conditioner, a washing machine, a water heater and the like, for example, the intelligent terminal equipment is opened, the intelligent terminal equipment is closed, the mode of the intelligent terminal equipment is set, the intelligent terminal equipment is started, the intelligent terminal equipment is suspended, the temperature of the intelligent terminal equipment is set, and the wind speed of the intelligent terminal equipment is set. "Life skills" in the field of life skills is understood as non-equipment control class skills or non-equipment control class matters related to daily life. The voice interaction text sentence in the life skill field mainly comprises skills such as inquiring weather, playing music, setting an alarm clock, setting a schedule, inquiring a calendar, inquiring time and the like.
In practical application, the voice interaction text sentence can be input into a domain classifier, and the domain classifier outputs a domain classification result of the voice interaction text sentence. And determining whether the voice interaction text sentence belongs to the target field or not according to the field classification result output by the field classifier.
The domain classifier may be embodied as a BERT model, for example. The domain classifier can be obtained by pre-training a model based on BERT and fine-tuning on a specific data set, and the domain recognition effect of the domain classifier based on the BERT model is good. The specific data set is a data set comprising standard single-intention text sample data and domain classification labels corresponding to the standard single-intention text sample data.
In an embodiment of the application, the standard single-intent text sample data comprises standard single-intent text sample data in the field of device control and/or standard single-intent text sample data in the field of life skills. For example, standard single-intention text sample data in the field of life skills may include "alarm clock setting 9 o' clock in tomorrow" and "inquiring tomorrow weather", etc., and tags of these standard single-intention text sample data are in the field of life skills; standard single-intent text sample data within the field of device control may include "reserve 9 on water heater in the morning of tomorrow", "9 on air conditioner in the morning of tomorrow", etc., and tags of these standard single-intent text sample data are the field of device control.
It can be appreciated that, since the specific data set of the training domain classifier is a data set including standard single-intention text sample data and domain classification labels corresponding to the standard single-intention text sample data, when the voice interactive text sentence input into the domain classifier is a standard single-intention interactive text sentence, the domain classifier recognizes the voice interactive text sentence as a result of the target domain. The target field may be a device control field or a life skill field. When the speech interactive text sentence input to the domain classifier is not a standard single-intent interactive text sentence, the domain classifier recognizes the speech interactive text sentence as a result of a non-target domain (recognition may also be considered as failed). Based on this, it can be determined whether the voice interactive text sentence input to the domain classifier is a standard single-intention interactive text sentence by recognizing whether the voice interactive text sentence is a result of the target domain. That is, when the domain classifier can directly output the target domain of the standard single-intention interactive text sentence, the voice interactive text sentence can be determined as the standard single-intention interactive text sentence.
Wherein, when the voice interactive text sentence is not a standard single-intention interactive text sentence, the voice interactive text sentence may be a fuzzy single-intention interactive text sentence or a multi-intention interactive text sentence. The standard indicates that the single-intention interactive text sentence can be directly identified by a dialogue engine and successfully analyzed to realize the control of the terminal equipment. The "ambiguity" corresponds to the "standard" and indicates that the expression of the single-intent interactive text statement is relatively ambiguous or obscure and cannot be successfully parsed directly by the dialog engine. For example, "too hot" is a fuzzy single-intent interactive text statement, and "turn on an air conditioner" is a standard single-intent interactive text statement that can be directly converted into the form of instructions that control the intelligent terminal device.
S202: when the field classification result of the voice interaction text sentence is judged not to belong to the target field, inputting the voice interaction text sentence into a multi-intention recognition GPT model, and obtaining a multi-intention recognition result output by the multi-intention recognition GPT model; the multi-purpose recognition GPT model is trained based on a model supervision fine tuning mode and/or is applied based on a multi-purpose recognition prompt word construction mechanism.
Referring to fig. 3, fig. 3 is a flowchart of another method for identifying multiple intent of a sentence based on a model according to an embodiment of the present application. As shown in fig. 3, when it is recognized that the domain classification result of the voice interactive text sentence belongs to the target domain (the device control domain or the life skill domain), it is determined that the voice interactive text sentence is a standard single-intention interactive text sentence. At this time, the voice interactive text sentence is directly sent to the dialogue engine for processing. Specifically, the dialogue engine performs semantic analysis and/or dialogue management of domain intention recognition on the voice interaction text sentence, and controls the corresponding terminal equipment to execute the intention. Where semantic parsing refers to recognizing the intent of a voice interactive text statement from the text level. Dialog management, i.e., context management. The equipment control field comprises the refinement fields of air conditioners, refrigerators, water heaters and the like. The life skill field comprises the refinement fields of alarm clocks, news, weather checking and the like. For example, the voice interactive text sentence is "turn on air conditioner", the dialog system performs semantic analysis of domain intention recognition to obtain refined domain as air conditioner, the intention is turned on, and then the air conditioner is controlled to execute the intention, namely, the air conditioner is controlled to be started.
And when the domain classification result of the voice interaction text sentence is judged not to belong to the target domain (namely not to the equipment control domain or the life skill domain), indicating that the recognition result fails. At this time, the voice interactive text sentence may be a ambiguous single-intention interactive text sentence or a multi-intention interactive text sentence. That is, the voice interactive text statement may be a multi-intent interactive text statement. At this time, multi-purpose analysis and mining can be performed on the voice interaction text sentence.
Specifically, as shown in fig. 3, a voice interactive text sentence is input into a multi-intention recognition GPT model to perform multi-intention analysis and mining, so as to obtain a multi-intention recognition result output by the multi-intention recognition GPT model. For example, the multi-intent recognition GPT model may be a HomeGPT model. Wherein GPT (GENERATIVE PRE-Trained Transformer, generating pre-training transducer model) is a deep learning model.
The embodiment of the application provides two specific implementation modes for obtaining the multi-intention recognition result after voice interaction text sentence analysis and mining based on a multi-intention recognition GPT model. One is an implementation based on instruction fine tuning, in which a multi-purpose recognition GPT model is trained based on a model supervision fine tuning manner, so that the trained multi-purpose recognition GPT model can be directly applied. Another is an implementation based on prompt learning. In this embodiment, the application of the multi-intent recognition GPT model is implemented in combination with a multi-intent recognition hint word construction mechanism. In practical application, one implementation mode can be selected according to practical requirements so as to improve the flexibility of multi-purpose identification; the multi-intention recognition can be performed by the two embodiments, and the two multi-intention recognition results are compared, so that the recognition result with higher reliability is selected, and the accuracy of the multi-intention recognition result is improved.
A first specific embodiment for obtaining a multi-intent recognition result after parsing and mining a voice interactive text sentence based on a multi-intent recognition GPT model will be described in detail.
In one possible implementation manner, before the voice interaction text sentence is input into the multi-purpose recognition GPT model, the training process of the multi-purpose recognition GPT model is further included, and specifically includes steps A1-A5:
A1: acquiring a training data set comprising training sample data; the training sample data comprises descriptive sentences of the multi-intention recognition task, interactive text sentence examples and a plurality of single-intention interactive text sentence examples which are analyzed by the interactive text sentence examples.
The training process of the multi-purpose recognition GPT model comprises a pre-training stage and a fine-tuning stage. In the embodiment of the application, the multi-intention recognition GPT model is a model which is pre-trained, and when the multi-intention recognition GPT model is trained, only the multi-intention recognition GPT model is required to be finely tuned based on training sample data. Fine tuning refers to tuning parameters in the multi-intent recognition GPT model, and specific parameters are not limited herein.
The multi-intent recognition GPT model is used to implement multi-intent resolution and mining (i.e., multi-intent parsing and mining). The training sample data suitable for the multi-purpose disassembly and excavation tasks can be obtained through template construction, online log collection, manual annotation and other modes, and the method is not limited herein. The training sample data comprises a plurality of pieces of sample data, and each piece of sample data comprises a description sentence of a multi-intention recognition task, an interactive text sentence example and a plurality of single-intention interactive text sentence examples which are obtained by analyzing the interactive text sentence examples.
In practical applications, the training sample data may be represented in the format of an instruction hints data set that includes an instruction field instructor and its field values, an original input field input and its field values, a target output field target and its field values. The description statement of the multi-intention recognition task can be understood as a field value of an instruction field instruction, the interactive text statement example can be understood as a field value of an original input field input, and a plurality of single-intention interactive text statement examples obtained by parsing the interactive text statement example can be understood as a field value of a target output field target.
The field value of the instruction field instruction is used for describing the multi-purpose disassembly and mining task specifically, the field value of the original input field input is used for providing a sentence to be parsed, and the field value of the target output field target is also understood as an original sentence (such as an interactive text sentence example) and used for providing a parsing result (such as a plurality of single-purpose interactive text sentence examples parsed by the interactive text sentence example) of output.
By way of example, one instruction trim dataset is as follows:
{
"construction" is now a text parser for multi-intent resolution and mining that requires outputting in JSON format a variety of single-intent phrases that may be contained in the original sentence. ",
Input 9 in the morning of tomorrow to go to Qingdao hotel to participate in business meeting,
"Target" ({ \ "text\" sets an alarm clock\ "at 9 points in the morning,
{ \Text\' query Mingzhou Qingdao tail number limited line },
{ \Text\ the 'reservation of open morning turns on the water heater',
{ \Text\\search for driving route of Qingdao large restaurant \ "},
{ \Text \query weather of Qingdao in the open morning \n \ "
}。
In practical application, a plurality of interactive text sentence examples and a plurality of single-intention interactive text sentence examples obtained by analyzing the interactive text sentence examples can be set in the training sample data as far as possible, so that the recognition effect of the multi-intention recognition GPT model can be improved.
As can be seen from the instruction fine tuning data set, the analyzed multiple single-intention interactive text statement examples comprise not only direct intention but also potential intention. For example, "set an alarm clock at 9 am in tomorrow", "query the tail number limit of Qingdao in tomorrow" and "search for driving route of Qingdao large restaurant" are all direct intentions obtained by direct analysis. "reserving the open morning to turn on the water heater" and "inquiring the weather of the open morning Qingdao" are indirect potential intentions. Therefore, based on the instruction fine-tuning data set, the fine-tuned multi-intention recognition GPT model can analyze the direct intention in the voice interaction text sentence and can also mine the potential intention in the voice interaction text sentence. The multi-intention recognition method has the advantages that the multi-intention recognition effect is good, the interactive feeling of the user can be improved, and the control of the terminal equipment is more intelligent.
A2: and splicing the description sentences of the multi-intention recognition tasks in the training sample data with the interactive text sentence examples to obtain spliced sentences.
Before fine tuning the multi-purpose recognition GPT model, the description sentence of the multi-purpose recognition task (i.e., the field value of the instruction field instruction) in the training sample data is further spliced with the interactive text sentence example (i.e., the field value of the input field input) to obtain a spliced sentence. The spliced statement may be represented as a field value of a context field.
The context field and the format of the field value are as follows:
{
"context": " instruction:<instruction>\n input:<input>",
"target": "```<target_json>```"
}。
The field value of the context field is "structure: < structure > \n input: < input >".
A3: and adding mask marks after splicing the sentences to obtain the model input sentences.
For example, a P-Tuning based approach may be used to make a supervised fine Tuning of the multi-intent recognition GPT model. Based on this, it is necessary to convert the spliced sentence obtained in A2 described above into a format more suitable for the trimming method, that is, reconstruct the input of the multi-purpose recognition GPT model. The reconstructed input and original target field values are utilized to finely tune the multi-purpose recognition GPT model, so that the accuracy of the multi-purpose recognition GPT model can be improved.
Specifically, mask marks are added after the spliced sentences to obtain model input sentences. The format of the model input statement, i.e., the converted format, may be represented as the field value of the input_ids field. Wherein the MASK identification is denoted as [ MASK ].
{
“input_ids”:<context>+[MASK]
“labels”: [SEP]+<target>+[SEP]
}。
The field value of the input_ids field is "< context > + [ MASK ]", and "labels" represents tags, and [ SEP ] is a statement separation flag.
A4: and inputting the model input sentence into a multi-intention recognition GPT model to obtain a plurality of predicted single-intention interactive text sentences output by the multi-intention recognition GPT model.
A5: the multi-intent recognition GPT model is trained from a plurality of predicted single-intent interaction text statements and a plurality of single-intent interaction text statement examples.
"Single-intent-to-interact text sentence example" may be understood as a sentence that is expected to be output by the multi-intent recognition GPT model, and "predicted single-intent-to-interact text sentence" is a sentence that is actually output by the multi-intent recognition GPT model. Then, a penalty function may be constructed from the plurality of predicted single-intent-to-interact text statements and the plurality of single-intent-to-interact text statement examples and corresponding penalty values obtained to fine tune the multi-intent recognition GPT model based on the obtained penalty values.
Based on the mode of A1-A5, the multi-intention recognition GPT model suitable for multi-intention analysis and mining can be directly fine-tuned. When the multi-intention recognition GPT model after fine adjustment is used for multi-intention analysis and mining, the description sentences of the multi-intention recognition task and the voice interaction text sentences can be spliced, and the spliced sentences are directly input into the multi-intention recognition GPT model after fine adjustment, so that a multi-intention recognition result output by the multi-intention recognition GPT model is obtained. Wherein description statements of the multi-intent recognition task may refer to the above examples. Based on the fine-tuned multi-intention recognition GPT model, the method can analyze the direct intention in the voice interaction text sentence, can mine the potential intention, and has a good multi-intention recognition effect.
A second specific embodiment for obtaining a multi-intent recognition result after parsing and mining a voice interactive text sentence based on a multi-intent recognition GPT model will be described in detail.
Illustratively, inputting a voice interaction text sentence into a multi-intention recognition GPT model, and obtaining a multi-intention recognition result output by the multi-intention recognition GPT model, wherein the multi-intention recognition result comprises B1-B3:
B1: constructing a first prompt word based on the voice interaction text sentence, the description sentence of the multi-intention recognition task and the sample example of the multi-intention recognition task; sample examples of the multi-intent recognition task include an interactive text statement example and a plurality of single-intent interactive text statement examples parsed by the interactive text statement example.
First prompt words are constructed and used for realizing prompt learning, and the first prompt words can be expressed as prompt. The process of constructing the first cue word may be understood as a multi-purpose recognition cue word construction mechanism. Thus, the multi-intention recognition GPT model can analyze and mine the multi-intention of the voice interaction text sentence based on the prompt learning method.
An example of a first hint word for the "multi-intent mining and un-mining task" is provided below:
The promt= "is a text parser for multi-intent disassembly and mining, and it is necessary to output various single-intent phrases possibly included in the original sentence in JSON format. "9 th day in early morning to drive to Qingdao hotel to take part in business meeting" \n Output [ { \ "text\": \ "set up 9 th day in early morning alarm clock\" }, { \ "text\": \ "query For Qingdao tail number limit on open day" }, { \ "text\": \ "reserve open water heater in early day morning" }, { \ "text\" search For the driving route of Qingdao hotel "}, { \" text\ ":" query For weather of Qingdao in early morning "} ] \ n Based on the above information, ANSWER THE following questions: \n" 8 th night to be returned to home in early evening to make a red-burned meat "\" n Output: ".
The description sentence of the multi-intention recognition task (i.e. multi-intention mining and disassembling task) is "a text parser for multi-intention mining and disassembling," and various single-intention phrases possibly contained in the original sentence need to be output in JSON format ". The content behind the "For example" and before the "Output" is an interactive text sentence example, and the content behind the "Output" and before the "Based" is a plurality of single-intention interactive text sentence examples obtained by analyzing the interactive text sentence examples. "8 th night returns to home to make braised pork for children today" \n Output "is a voice interactive text sentence.
In one possible implementation, the present application provides a sample example acquisition process for two multi-intent recognition tasks, see in particular E1-E3 and F1-F3 below.
B2: inputting the first prompt word into a multi-intention recognition GPT model so that the multi-intention recognition GPT model analyzes the voice interaction text statement according to the description statement of the multi-intention recognition task and the sample example of the multi-intention recognition task.
In practical application, the multi-intention recognition GPT model determines specific task content of the multi-intention recognition task according to description sentences of the multi-intention recognition task, learns by taking an interactive text sentence example and a plurality of single-intention interactive text sentence examples obtained by analyzing the interactive text sentence example as references, and analyzes the multi-intention of the voice interactive text sentence.
Wherein the multi-intent recognition GPT model is a pre-trained model, in this example, no fine tuning of the multi-intent recognition GPT model is required. And the GPT model can be identified by using multiple intents in combination with the first prompt word.
It can be understood that the multiple single-intent interactive text sentence examples provided by the embodiment of the application not only include direct intent in the interactive text sentence example, but also include potential intent in the interactive text sentence example. When the multi-intention recognition GPT model is used for learning by taking a sample example of a multi-intention recognition task as a reference, the single-intention interaction text sentence in the voice interaction text sentence can be analyzed, more potential single-intention interaction text sentences can be mined, and the multi-intention recognition effect is good.
B3: and acquiring a multi-intention recognition result obtained by analyzing the multi-intention recognition GPT model.
After the multi-intention recognition GPT model refers to and learns based on the interactive text sentence examples and a plurality of single-intention interactive text sentence examples analyzed by the interactive text sentence examples, the voice interactive text sentence can be analyzed into a plurality of single-intention interactive text sentences.
Based on the above modes B1-B3, the multi-intention recognition GPT model can analyze multi-intention of the voice interaction text sentence based on the prompt learning mode, and can analyze direct intention and mine potential intention.
Based on the above, when the voice interactive text sentence is parsed based on the multi-intent recognition GPT model, not only can a plurality of intentions related to the voice interactive text sentence be obtained, but also the plurality of intentions are various levels of intentions, such as direct intentions and indirect potential intentions. Thus, the method can help the user consider more possible situations in advance, and enable the user to experience more intelligent interaction.
S203: it is determined whether the multi-intent recognition result is a plurality of single-intent interactive text sentences.
It is known that the voice interactive text sentence may be a ambiguous single-intent interactive text sentence or a multi-intent interactive text sentence. In practical applications, when the voice interactive text sentence is a fuzzy single-intention interactive text sentence, the multi-intention recognition result is not represented by a plurality of single-intention interactive text sentences. When the voice interactive text sentence is a multi-intention interactive text sentence, the multi-intention recognition result may include a plurality of single-intention interactive text sentences.
Based on this, it can be determined whether the voice interactive text sentence is a ambiguous single-intent interactive text sentence or a multi-intent interactive text sentence by determining whether the multi-intent recognition result is a plurality of single-intent interactive text sentences. "determining whether the multi-intent recognition result is a plurality of single-intent-to-interact text sentences" can be understood as "whether the voice-to-interact text sentences can be broken down into a plurality of single-intent-to-interact text sentences", as shown in fig. 3.
S204: when the multi-intention recognition result is judged to be a plurality of single-intention interactive text sentences, each single-intention interactive text sentence is converted into a corresponding standard single-intention interactive text sentence; wherein each standard single-intent interaction text sentence is used to represent one interaction intent in the user's voice.
When the multi-intention recognition result is judged to be a plurality of single-intention interactive text sentences, the multi-intention recognition result is not empty, a plurality of single-intention interactive text sentences can be obtained, and the voice interactive text sentences are also similarly represented to be the multi-intention interactive text sentences.
In practical applications, the plurality of single-intent interactive text sentences may specifically include single-intent sentences within the field of device control and/or the field of living skills. For example, a single intent statement within the device control domain may include: turning on/off devices, device mode settings, device status queries, device attribute adjustments, etc.; single intent statements within the area of life skills may include device appointments, querying weather, playing music, setting an alarm clock, scheduling, calendar alarm clocks, querying calendars, querying time, search navigation, etc., as shown in fig. 3. Wherein the device reservation includes reserving or canceling a reservation for a particular function of the device.
After obtaining a plurality of single-intent interactive text sentences and converting each single-intent interactive text sentence into a corresponding standard single-intent interactive text sentence, the standard single-intent interactive text sentence can facilitate the successful analysis processing of the dialogue engine.
In one possible implementation manner, the embodiment of the application provides a specific implementation manner for converting each single-intention interaction text sentence into a corresponding standard single-intention interaction text sentence, which comprises C1-C4:
C1: and identifying the domain classification result of each single-intention interactive text sentence.
In practical applications, the plurality of single-intent interactive text sentences may include fuzzy single-intent interactive text sentences or standard single-intent interactive text sentences. The plurality of single-intent interactive text sentences can be fuzzy single-intent interactive text sentences or standard single-intent interactive text sentences.
At this time, the individual single-intent interactive text sentences may be domain-classified. Since in the embodiment of the present application, the domain classification process is a second domain classification process, the domain classification process may also be referred to as a second domain classification.
Specifically, the domain classifier in the above embodiment can be used to implement domain classification of each single-intent interactive text sentence, and obtain a domain classification result of each single-intent interactive text sentence.
C2: determining whether the domain classification result of the single intention interactive text sentence belongs to the target domain.
And judging the domain classification result of each single-intention interactive text sentence, and judging whether the domain classification result of the single-intention interactive text sentence belongs to a target domain (namely, the equipment control domain or the life skill domain).
And C3: and when the field classification result of the single-intention interactive text sentence is judged to belong to the target field, determining the single-intention interactive text sentence as a standard single-intention interactive text sentence.
According to the embodiment, when the domain classification result of the single-intention interactive text sentence belongs to the target domain, the single-intention interactive text sentence is represented as a standard single-intention interactive text sentence. At this time, the single-intention interactive text sentence is not required to be converted, and the single-intention interactive text sentence is directly determined to be a standard single-intention interactive text sentence.
The single intent interaction text sentence may then be entered into the dialog engine for subsequent processing, as shown in fig. 3.
And C4: when the field classification result of the single-intention interactive text statement is judged not to belong to the target field, determining the single-intention interactive text statement as a fuzzy single-intention interactive text statement, and converting the fuzzy single-intention interactive text statement into a standard single-intention interactive text statement.
When the domain classification result of the single-intention interaction text sentence does not belong to the target domain, the single-intention interaction text sentence can be determined to be a fuzzy single-intention interaction text sentence because the single-intention interaction text sentence only comprises a single intention. At this time, the single-intent interactive text sentence needs to be converted to obtain a corresponding standard single-intent interactive text sentence.
For example, a single-intent interactive text sentence is "too hot", and a standard single-intent interactive text sentence after conversion is "air-conditioned". For another example, the single-intention interactive text sentence is "Beijing is also snowy in the month", and the converted standard single-intention interactive text sentence is "query Beijing weather". The converted standard single-intention interactive text sentence is a regular interactive sentence which can be successfully analyzed by a dialogue engine and can directly control the terminal equipment. For example: "too hot", the generation of which needs to be converted into "turning on the air conditioner"; "Beijing is also snowy in this month", its generation needs to be transformed into "query Beijing weather".
In one possible implementation manner, in connection with fig. 3, an embodiment of the present application provides a specific implementation manner of converting a fuzzy single-intent interaction text sentence into a standard single-intent interaction text sentence in C4, including:
inputting the single-intention interactive text sentence into a sentence standardization GPT model, and obtaining a standard single-intention interactive text sentence output by the sentence standardization GPT model.
The sentence standardization GPT model is applied based on a sentence standardization prompt word construction mechanism, so that single-intention interactive text sentences are generalized based on a prompt learning method, and corresponding standard single-intention interactive text sentences are obtained.
Illustratively, inputting the single-intention interactive text sentence into a sentence standardization GPT model, and obtaining a standard single-intention interactive text sentence output by the sentence standardization GPT model, wherein the method comprises the following steps of D1-D3:
D1: constructing a second prompt word based on the single-intent interactive text sentence, the task description sentence converted into the standard single-intent interactive text sentence in the target field, the description sentence in the target field and the sample example of the task converted into the standard single-intent interactive text sentence in the target field; sample examples of tasks converted into standard single-intent interactive text statements of the target domain include single-intent interactive text statement examples of the target domain and standard single-intent interactive text statement examples converted from the single-intent interactive text statement examples.
And constructing a second prompt word, wherein the second prompt word is used for realizing prompt learning, and the second prompt word can be expressed as a prompt. The process of constructing the second cue word can be understood as a sentence-standardized cue word construction mechanism. Therefore, the sentence standardization GPT model can convert the single-intention interactive text sentence into a standard single-intention interactive text sentence in the target field based on a prompt learning method.
A second hint word example is provided below:
The promtt= "needs to generalize the standard sentence according to the provided preset domain set, note that if the original sentence cannot be generalized to any preset domain, null is returned. The preset field set and the description thereof include \n { \equipment control class\including opening/closing equipment, equipment mode setting, equipment state inquiry, equipment attribute adjustment and the like\\equipment reservation class\including reserving or canceling reservation of specific functions of the equipment\living skill class\including inquiring weather, playing music, setting an alarm clock, scheduling, inquiring calendar, inquiring time and the like\n For example, "-" good heat "-", opening an air conditioner "-" "is somewhat dark in a house" - "turning on a lamp" - "" Beijing "-" also gives a snow to inquire about Beijing weather "-" a house too damp "-" air conditioner setting dehumidification mode "-" 8-point reservation "- >" water heater 8-point starting "-" n Based on the above information "-" ANSWER THE following questions "-" carrying out a shower "-" and "-" carrying out a shower "-".
Wherein, the standard sentences conforming to the specific preset fields are generalized according to the provided preset field set, and note that if the original sentences cannot be generalized to any preset field, the task description sentences of which null is the standard single-intention interactive text sentences converted into the target field are returned.
The content in the preset domain set and the description thereof includes the target domain and the description statement of the target domain, so that prompt learning is only performed for the target domain. The device control class and the device reservation class belong to the field of device control, the device control class comprises description sentences for opening/closing devices, setting device modes, inquiring device states, adjusting device attributes and the like, the device reservation or cancellation reservation of specific functions of the devices comprises description sentences for the device reservation class, and the device reservation class comprises description sentences for inquiring weather, playing music, setting alarm clocks, making schedules, inquiring calendars, inquiring time and the like.
The standard single-intention interactive text sentence examples are converted from the single-intention interactive text sentence examples by taking "good heat", "dark at home", "snow is removed in Beijing this month", "too damp at home", "8-point bathing", and the like after "For example" as target fields, and taking "on air conditioner", "turn on light", "query Beijing weather", "air conditioner set dehumidification mode", and "water heater reserved 8-point starting" and the like. A single intent interactive text statement may be entered at "input_query".
In practical applications, the process of obtaining a sample example of a task of a standard single-intent interactive text sentence in a target field may refer to the following E1-E3 and F1-F3, and will not be described herein.
D2: and inputting the second prompt word into a sentence standardization GPT model, so that the sentence standardization GPT model analyzes the single-intention interactive text sentence according to the task description sentence converted into the standard single-intention interactive text sentence in the target field, the description sentence in the target field and the sample example of the task converted into the standard single-intention interactive text sentence in the target field.
In practical application, the sentence standardization GPT model determines a task to be executed according to a task description sentence converted into a standard single-intention interactive text sentence in a target field, and learns by taking a sample example of the target field, the description sentence in the target field and the task converted into the standard single-intention interactive text sentence in the target field as references, so as to analyze the single-intention interactive text sentence. The parsing process is a process of converting the corresponding standard single-intention interactive text sentence.
D3: and obtaining a standard single-intention interactive text sentence obtained by analyzing the sentence standardization GPT model.
After learning by taking the target field, the description sentence of the target field and the sample example of the task converted into the standard single-intention interactive text sentence of the target field as references, the sentence standardization GPT model can analyze the standard single-intention interactive text sentence corresponding to the single-intention interactive text sentence.
Based on the above modes of D1-D3, the sentence standardization GPT model can convert the single-intention interactive text sentence into a corresponding standard single-intention interactive text sentence based on a prompt learning mode.
It will be appreciated that, as shown in fig. 3, when it is determined that the multi-intent recognition result is not a plurality of single-intent-to-interact text sentences, the voice-to-interact text sentences are determined to be ambiguous single-intent-to-interact text sentences. Further, in response to the voice interaction text statement being a ambiguous single-intent interaction text statement, the voice interaction text statement is converted to a standard single-intent interaction text statement.
Illustratively, converting the voice interactive text statement into a standard single-intent interactive text statement includes:
inputting the voice interactive text sentence into a sentence standardization GPT model to obtain a standard single-intention interactive text sentence output by the sentence standardization GPT; the sentence standardized GPT model is applied based on a sentence standardized prompt word construction mechanism.
It can be appreciated that the process of converting the voice interactive text sentence into the corresponding standard single-intention interactive text sentence by the sentence standardization GPT model may refer to D1-D3, and will not be described herein.
Based on the above, in order to identify, mine and analyze multiple intentions in the voice of the user, the embodiment of the application couples the multi-intent identification GPT model with the dialogue system, analyzes and mines multiple intentions from the voice interactive text sentence of the user, and obtains multiple standard single-intent interactive text sentences, so as to improve the intelligent interactive effect of the existing dialogue system and meet the demands of various aspects of the user in a more humanized manner.
In a possible implementation manner, the method for identifying multiple intention of a sentence based on a model provided by the embodiment of the application further includes: a sample instance of a multi-intent recognition task is obtained. A sample example of the multi-intent recognition task is used to construct the first prompt in the above embodiment, where the first prompt is used to combine the multi-intent recognition GPT model to obtain a plurality of single-intent interaction text sentences of the voice interaction text sentence.
The embodiments of the present application provide two specific implementations for obtaining sample examples of multi-intent recognition tasks, see in particular below.
In one possible implementation, a sample instance of a multi-intent recognition task is obtained, comprising steps E1-E3:
e1: acquiring a plurality of historical single-intention interaction text sentences, and constructing at least one historical multi-intention interaction text sentence based on the plurality of historical single-intention interaction text sentences; the historical multi-intention interactive text sentence and a plurality of historical single-intention interactive text sentences obtained by analyzing the historical multi-intention interactive text sentence form one sample.
The method comprises the steps of obtaining multi-intention interaction text sentences of a current user or other users and intelligent home terminal equipment in history interaction, and disassembling the multi-intention interaction text sentences in the history interaction to obtain a plurality of history single-intention interaction text sentences.
Further, at least one historical multi-intent interaction text sentence can be constructed by arranging and combining a plurality of historical single-intent interaction text sentences. Meanwhile, the result of the disassembly of the historical multi-intention interactive text sentences is obtained, namely, each historical single-intention interactive text sentence required by permutation and combination is obtained.
From the above, a large number of sample examples in the first prompt word applicable to the "multi-intention mining and tearing down task" described above (i.e., sample examples of multi-intention recognition tasks) can be generated. Each sample example comprises a historical multi-intention interaction text sentence and a plurality of historical single-intention interaction text sentences which are obtained by analyzing the historical multi-intention interaction text sentences. At least one sample instance may constitute an example dataset of a multi-purpose mining and dismantling task, denoted as D M.
E2: and obtaining the similarity between the voice interaction text sentence and the historical multi-intention interaction text sentence in each sample example, sorting the similarity, and screening at least one target sample example from at least one sample example based on the sorted similarity.
In order to better analyze the voice interaction text sentence, k sample examples suitable for the current multi-intention mining and dismantling task are selected from a large number of constructed sample examples, so that the multi-intention recognition GPT model is guided to perform multi-intention mining and dismantling with good effect on the voice interaction text sentence. k sample examples are determined by screening k historical multi-intent interaction text sentences.
Illustratively, a semantic similarity model is constructed for evaluating semantic similarity between any two sentences. The internal structure and training process of the semantic similarity model are not limited, and can be set according to actual conditions.
The speech interactive text sentence can be understood as an interactive sentence to be parsed, and is denoted as q 0. Through the semantic similarity model, k pieces of historical multi-intention interactive text sentences which are most similar to q 0 are retrieved from the constructed example data set D M, and are ranked according to the similarity from large to small, and are denoted as q 1,…,qk after being ranked. Wherein, the semantic similarity between q 1 and q 0 is highest, and k is a positive integer. k sample examples, namely k target sample examples, of the k historical multi-intention interactive text sentences belong to.
E3: at least one target sample instance is determined as a sample instance of the multi-intent recognition task.
In this way, at least one target sample example obtained by screening can be determined as a sample example of the multi-purpose recognition task.
Based on E1-E3, the similarity degree of the target sample example and the voice interaction text sentence is higher, the target sample example is used as the sample example of the multi-intention recognition task to construct the first prompt word, and multi-intention analysis accuracy and mining accuracy of the multi-intention recognition GPT model on the voice interaction text sentence can be facilitated.
In practical application, directly backfilling all k target sample examples obtained by E1-E3 into the first hint word, the following two problems may exist: on one hand, the problems of overlarge calculation consumption, poor processing of long texts caused by context length limitation and the like often exist; on the other hand, q 1,…,qk obtained by similarity retrieval may be generated by arranging and combining the same several historical single-intention interactive text sentences, which means that q 1,…,qk obtained by retrieval is information redundant.
For this purpose, further screening of the obtained target sample examples is required. Based on this, in one possible implementation manner, the embodiment of the present application provides a specific implementation manner of determining at least one target sample example as a sample example of the multi-purpose recognition task in E3, including E31-E36:
e31: a sample instance set is constructed that includes at least one target sample instance.
The sample set comprises k target sample samples, each target sample comprises a historical multi-intention interaction text sentence, and k target sample samples comprise k historical multi-intention interaction text sentences in total.
E32: determining target historical multi-intention interaction text sentences with highest similarity with voice interaction text sentences in all target sample examples, determining the target sample examples of the target historical multi-intention interaction text sentences as sample examples of multi-intention recognition tasks, and taking the target historical multi-intention interaction text sentences as current spliced sentences.
Since k historical multi-intent interactive text sentences which are most similar to the voice interactive text sentences are represented as q 1,…,qk, the target historical multi-intent interactive text sentence with the highest similarity to the voice interactive text sentences is q 1. That is, the initial current spliced statement is q 1.
And, the target sample example to which the target history multi-intention interactive text sentence q 1 belongs is determined as a sample example of a multi-intention recognition task.
E33: calculating target average self-information of the current spliced sentence; the target average self-information is used for measuring the information quantity of the current spliced statement.
In the theory of information, self-information can be used to measure the uncertainty of an event, i.e., the amount of information contained by the event. Each word element in the current spliced sentence has corresponding self-information, and each word element forms the current spliced sentence. Average self-information of the current spliced sentence can be obtained and used for evaluating the information content of the current spliced sentence. The initial current spliced sentence is the target historical multi-intent interactive text sentence q 1.
The expression of the target average self-information H (S) of the current spliced sentence under the language model is as follows:
Where S is the current spliced statement, S consists of (x 0,…,xn-1), i.e., s= (x 0,…,xn-1). Initially, s=q 1. Initially, H (S) =h (q 1). n is the number of tokens in the current spliced sentence, x t is the t-th token in the current spliced sentence, I (x t) is the self-information of x t, and x 0,…,xt-1 is the first t tokens in the current spliced sentence. The token may be represented as a token.
For the generative language model, given the first t token (denoted as x 0,…,xt-1), the probability distribution of the next token (denoted as x t) is predicted to be P (x t|x0,…,xt-1), and then the self information I (x t),I(xt)=-log2P(xt|x0,…,xt-1) of x t can be calculated.
E34: traversing the rest target sample examples in the sample example set, performing sentence splicing on the current spliced sentences and the historical multi-intention interactive text sentences in the traversed target sample examples to obtain spliced text sentences, calculating average self-information of the spliced text sentences, and differentiating the average self-information of the spliced text sentences with the target average self-information to obtain average self-information gain corresponding to the traversed target sample examples.
Taking initial execution E34 as an example, when initial execution E34 is performed, the remaining target sample examples in the sample example set are specifically target sample examples in the sample example set except for the target sample examples including q 1. The remaining historical multi-intent interactive text sentence in the sample example set is q 1,…,qk.
The remaining target sample instances in the sample instance set are traversed, primarily traversing the remaining historical multi-intent interactive text sentences, [ q 1,…,qk ]. Each historical multi-intent interaction text sentence traversed may be represented as q i,qi∈{q2,…,qk.
And performing sentence splicing on the target historical multi-intention interactive text sentence and the traversed historical multi-intention interactive text sentence in the target sample example to obtain a spliced text sentence. That is, q 1 is spliced with the traversed q i, and the spliced text sentence is represented as [ q 1,qi ]. The average self-information of the spliced text sentence is calculated, and the average self-information of the spliced text sentence is represented as H ([ q 1,qi ]), i=2, …, k. The calculation process of the average self-information of the spliced text sentence can refer to H (q 1), and will not be described herein.
Further, average self-information of the spliced text sentences is differenced with the target average self-information, and average self-information gain corresponding to the traversed target sample example is obtained。/>。/>
E35: and screening the target sample examples with highest average self-information gain from the rest target sample examples, determining the target sample example with the highest average self-information gain as the sample example of the multi-intention recognition task when the highest average self-information gain is larger than a preset average self-information gain threshold, and removing the target sample example with the highest average self-information gain from the sample example set.
The highest average self-information gain can be expressed asThe index t of the historical multi-intent interaction text sentence in the target sample example with the highest average self-information gain is specifically/>The historical multi-intention interactive text sentence in the target sample example with the highest average self-information gain is q t.
In practical application, the threshold value of the average self-information gain is set to be delta. If q t corresponds toThe target sample instance where q t is located is determined as a sample instance of a multi-intent recognition task. Thus, q t is a historical multi-intent interactive text sentence that maximizes the information amount of the spliced text sentence.
After the target sample example where q t is located is determined as the sample example of the multi-purpose recognition task, the target sample example with the highest average self-information gain (i.e. the target sample example where q t is located) is removed from the sample example set.
E36: and re-determining the text sentence after splicing the current spliced sentence and the historical multi-intention interactive text sentence in the target sample example with the highest average self-information gain as the current spliced sentence, and re-executing the steps of calculating the target average self-information of the current spliced sentence and the follow-up step until the average self-information gain is smaller than or equal to the average self-information gain threshold value.
Taking initial execution E36 as an example, when initial execution E36 is performed, [ q 1,qt ] is redetermined as a current spliced statement, and E33 and subsequent steps are re-performed until the calculated average self-information of the current spliced statement is less than or equal to delta.
It can be appreciated that each time q t is selected is a historical multi-intent interactive text statement that maximizes the information content of the spliced text statement (on average, increases most from the information). Thus, m sentences obtained by screening can be obtained, m is a positive integer, and m is less than k. The m sentences are composed of q t selected each time.
For example, q 1,…,qk is specifically q 1,q2,q3,q4. The current spliced statement is q 1, and the target sample example to which q 1 belongs is determined to be a sample example of the multi-intention recognition task. The target average self-information of q 1 is calculated, q 2,q3,q4 is respectively spliced with q 1, and the spliced text sentences are respectively [ q 1,q2]、[q1,q3 ] and [ q 1,q4 ]. The average self-information of the spliced text sentences is H ([ q 1,q2])、H([q1,q3 ]) and H ([ q 1,q4 ]). Thereby obtaining average self-information gains of respectively、/>/>. Determining the highest average self-information gain as/>Then q t at this time is determined to be q 2. At this time, when q 2 is greater than δ, the target sample example to which q 2 belongs is determined as a sample example of a multi-purpose recognition task, and the target sample example to which q 2 belongs is removed from the sample example set. At this point, the current spliced statement is updated to q 1,q2.
Furthermore, the target average self-information of the currently spliced statement [ q 1,q2 ] is recalculated, the rest of the non-spliced q 3,q4 is respectively spliced with [ q 1,q2 ], and the spliced text statements are respectively [ q 1,q2,q3 ] and [ q 1,q2,q4 ]. The average self-information of the spliced text sentences is H ([ q 1,q2,q3 ]) and H ([ q 1,q2,q4 ]). Thereby obtaining average self-information gains of respectively/>. Determining the highest average self-information gain as/>Then q t at this time is determined to be q 3. At this time, when q 3 is greater than δ, the target sample example to which q 3 belongs is determined as a sample example of a multi-purpose recognition task, and the target sample example to which q 3 belongs is removed from the sample example set. At this point, the current spliced statement is updated to q 1,q2,q3. /(I)
Further, the target average self-information of the currently spliced sentence [ q 1,q2,q3 ] is recalculated, and the rest of the non-spliced q 4 and [ q 1,q2,q3 ] are spliced, and the spliced text sentence is [ q 1,q2,q3,q4 ]. The average self-information of the spliced text sentences is H ([ q 1,q2,q3,q4 ]). Thereby obtaining average self-information gains of respectively. If/>And less than or equal to delta, ending. At this time, the obtained m sentences are specifically q 1,q2,q3, and m is 3. Sample examples of the multi-intent recognition task include a target sample example to which q 1 belongs, a target sample example to which q 2 belongs, and a target sample example to which q 3 belongs.
By backfilling the obtained sample examples of the multi-purpose recognition task into the constructed first prompt word through the E1-E3 and the E31-E36, the characteristic prompt word applicable to the voice interaction text sentence can be dynamically generated. Moreover, there is substantially no information redundancy in the sample example of the obtained multi-intent recognition task.
In one possible implementation manner, the embodiment of the present application further provides another specific implementation manner of obtaining a sample example of the multi-purpose recognition task, which includes steps F1-F3:
f1: constructing a sample example library; the sample example library comprises a plurality of sample examples, and the sample examples comprise a plurality of multi-intention interaction text sentences and a plurality of single-intention interaction text sentences which are obtained by analyzing the multi-intention interaction text sentences.
F2: identifying a target type entity of the voice interaction text sentence, and acquiring a target entity with a preset association relationship with the target type entity; the target type entity comprises a device name class entity and/or a skill name class entity; the preset association relationship includes the same relationship or the item execution association relationship.
The device name class entity is, for example, the name of the intelligent terminal device such as a refrigerator, an air conditioner, and the like, and is not limited herein. The skill name class entity is, for example, the name of a living skill such as weather, alarm clock, calendar, etc., and is not limited herein.
The item execution related relationship may include, for example, an item execution precedence relationship, an item simultaneous execution relationship, an item association relationship, and the like, and is not limited herein. For example, if there is a matter execution precedence relationship between turning off the window shades and turning on the lights, there is a matter execution precedence relationship between the entity "window shades" and the entity "lights". As another example, if there is a contemporaneous execution relationship between the query weather and the query route, there is a contemporaneous execution relationship between the entity "weather" and the entity "route". For another example, if a relationship of event association exists between the alarm clock and the water heater at 8 points in the early stage, the relationship of event association exists between the entity "alarm clock" and the water heater ".
In practical application, the entity recognition model can be used for recognizing the target type entity of the voice interaction text sentence, and the specific structure and training process of the entity recognition model are not limited.
F3: searching sample examples comprising target entities from a sample example library, and determining the searched sample examples as sample examples of the multi-intention recognition task.
For example, the voice interactive text sentence is "open air conditioner and close curtain", wherein the target type entities include "air conditioner" and "curtain", and further the target entities identical to the target type entities are determined, that is, the target entities are "air conditioner" and "curtain". Searching a sample example comprising air conditioner and a sample example comprising curtain from a sample example library, and determining the searched sample example as a sample example of the multi-intention recognition task.
And, a target entity having a transaction execution correlation with the target type entity may also be determined. For example, when a light is turned on after a window covering is usually turned off, it may be determined that a target entity related to the "window covering" occurrence execution may be a "light". At this time, a sample example including "light" is searched for from the sample example library, and the searched sample example is also determined as a sample example of the multi-intention recognition task.
It can be understood that a certain relation exists between the sample examples of the multi-intention recognition tasks determined based on the F1-F3 mode and the voice interaction text sentences, so that the multi-intention analysis effect of the multi-intention recognition GPT model based on prompt learning can be improved.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
Based on the method provided by the embodiment of the method, the embodiment of the application also provides a sentence multi-intention recognition device based on a model, and the sentence multi-intention recognition device based on the model is described below with reference to the accompanying drawings. Because the principle of solving the problem by the device in the embodiment of the present disclosure is similar to that of the method for identifying multiple intent of the sentence based on the model in the embodiment of the present disclosure, the implementation of the device may refer to the implementation of the method, and the repetition is not repeated.
Referring to fig. 4, the diagram is a schematic structural diagram of a device for identifying multiple intent of a sentence based on a model according to an embodiment of the present application. As shown in fig. 4, the model-based sentence multi-intention recognition apparatus includes:
a first determining unit 401 for determining whether a domain classification result of a voice interaction text sentence obtained by voice recognition of a user belongs to a target domain;
A first obtaining unit 402, configured to input the voice interaction text sentence into a multi-intention recognition GPT model when it is determined that the domain classification result of the voice interaction text sentence does not belong to the target domain, and obtain a multi-intention recognition result output by the multi-intention recognition GPT model; the multi-purpose recognition GPT model is trained based on a model supervision fine tuning mode and/or is applied based on a multi-purpose recognition prompt word construction mechanism;
a second determining unit 403, configured to determine whether the multi-intent recognition result is a plurality of single-intent interactive text sentences;
A first converting unit 404, configured to convert each single-intent interaction text sentence into a corresponding standard single-intent interaction text sentence when it is determined that the multi-intent recognition result is a plurality of single-intent interaction text sentences; wherein each of the standard single-intent interactive text statements is used to represent one of the interactive intents in the user's speech.
In one possible implementation, the first conversion unit 404 includes:
The first recognition subunit is used for recognizing the domain classification result of each single-intention interactive text sentence;
a judging subunit, configured to judge whether a domain classification result of the single-intent interactive text sentence belongs to the target domain;
A first determining subunit, configured to determine, when it is determined that a domain classification result of the single-intent interaction text sentence belongs to the target domain, that the single-intent interaction text sentence is a standard single-intent interaction text sentence;
and the second determining subunit is used for determining the single-intention interactive text sentence as a fuzzy single-intention interactive text sentence and converting the fuzzy single-intention interactive text sentence into a standard single-intention interactive text sentence when the field classification result of the single-intention interactive text sentence is judged not to belong to the target field.
In one possible implementation, the apparatus further includes:
The first determining unit is used for determining that the voice interaction text sentence is a standard single-intention interaction text sentence when the field classification result of the voice interaction text sentence is recognized to belong to the target field;
And the sending unit is used for sending the voice interaction text sentence to a dialogue engine so that the dialogue engine can perform semantic analysis and/or dialogue management on the voice interaction text sentence.
In one possible implementation, the apparatus further includes:
A second determining unit configured to determine, when it is determined that the multi-intent recognition result is not the plurality of single-intent interactive text sentences, that the voice interactive text sentences are ambiguous single-intent interactive text sentences;
and the second conversion unit is used for converting the voice interaction text sentence into a standard single-intention interaction text sentence in response to the voice interaction text sentence being a fuzzy single-intention interaction text sentence.
In one possible implementation, the apparatus further includes:
A second obtaining unit, configured to obtain a training data set including training sample data before inputting the voice interaction text sentence into a multi-purpose recognition GPT model; the training sample data comprises a description sentence of a multi-intention recognition task, an interactive text sentence example and a plurality of single-intention interactive text sentence examples which are obtained by analysis of the interactive text sentence example;
The splicing unit is used for splicing the description sentences of the multi-intention recognition tasks in the training sample data with the interactive text sentence examples to obtain spliced sentences;
an adding unit, configured to add mask marks to the spliced sentences to obtain model input sentences;
The input unit is used for inputting the model input sentence into a multi-intention recognition GPT model to obtain a plurality of prediction single-intention interactive text sentences output by the multi-intention recognition GPT model;
And the training unit is used for training the multi-intention recognition GPT model according to a plurality of the predicted single-intention interaction text sentences and a plurality of the single-intention interaction text sentence examples.
In one possible implementation manner, the first obtaining unit 402 includes:
A first construction subunit, configured to construct a first prompt word based on the voice interaction text sentence, the description sentence of the multi-intent recognition task, and the sample example of the multi-intent recognition task; the sample examples of the multi-intention recognition task comprise interactive text sentence examples and a plurality of single-intention interactive text sentence examples which are obtained by analysis of the interactive text sentence examples;
The first input subunit is used for inputting the first prompt word into a multi-intention recognition GPT model so that the multi-intention recognition GPT model analyzes the voice interaction text statement according to the description statement of the multi-intention recognition task and the sample example of the multi-intention recognition task;
the first acquisition subunit is used for acquiring a plurality of single-intention interaction text sentences obtained by analyzing the multi-intention recognition GPT model.
In one possible implementation, the apparatus further includes:
A third acquisition unit configured to acquire a sample example of the multi-intention recognition task; the sample example of the multi-intention recognition task is used for constructing a first prompt word, and the first prompt word is used for combining a multi-intention recognition GPT model to obtain a plurality of single-intention interaction text sentences of the voice interaction text sentences;
The third acquisition unit includes:
The second acquisition subunit is used for acquiring a plurality of historical single-intention interaction text sentences and constructing at least one historical multi-intention interaction text sentence based on the historical single-intention interaction text sentences; the historical multi-intention interactive text statement and a plurality of historical single-intention interactive text statements obtained by analyzing the historical multi-intention interactive text statement form a sample;
A third obtaining subunit, configured to obtain a similarity between the voice interaction text sentence and the historical multi-purpose interaction text sentence in each sample example, sort the similarity, and screen at least one target sample example from at least one sample example based on the sorted similarity;
a third determination subunit configured to determine the at least one target sample instance as a sample instance of the multi-intent recognition task.
In one possible implementation manner, the third determining subunit includes:
A second construction subunit for constructing a sample instance set comprising the at least one target sample instance;
A fourth determining subunit, configured to determine, from among the target sample examples, a target historical multi-intent interaction text sentence having a highest similarity to the voice interaction text sentence, determine a target sample example to which the target historical multi-intent interaction text sentence belongs as a sample example of a multi-intent recognition task, and use the target historical multi-intent interaction text sentence as a current spliced sentence;
The calculating subunit is used for calculating the target average self-information of the current spliced statement; the target average self-information is used for measuring the information quantity of the current spliced statement;
A traversing subunit, configured to traverse the rest of target sample examples in the sample example set, perform sentence splicing on the current spliced sentence and a historical multi-intention interactive text sentence in the traversed target sample example to obtain a spliced text sentence, calculate average self-information of the spliced text sentence, and make a difference between the average self-information of the spliced text sentence and the average self-information of the target to obtain an average self-information gain corresponding to the traversed target sample example;
a screening subunit, configured to screen, from the remaining target sample examples, a target sample example with the highest average self-information gain, determine, when the highest average self-information gain is greater than a preset average self-information gain threshold, the target sample example with the highest average self-information gain as a sample example of a multi-purpose recognition task, and remove the target sample example with the highest average self-information gain from the sample example set;
and the execution subunit is used for redefining the text sentence after splicing the current spliced sentence and the historical multi-intention interactive text sentence in the target sample example with the highest average self-information gain into the current spliced sentence, and re-executing the steps of calculating the target average self-information of the current spliced sentence and the follow-up steps until the average self-information gain is smaller than or equal to the average self-information gain threshold.
In one possible implementation, the expression of the target average self-information H (S) is:
The S is the current spliced sentence, n is the number of the lemmas in the current spliced sentence, x t is the t-th lemmas in the current spliced sentence, I (x t) is the self-information of x t, and x 0,…,xt-1 is the first t lemmas in the current spliced sentence.
In one possible implementation manner, the third determining subunit includes:
A third construction subunit for constructing a sample instantiation library; the sample example library comprises a plurality of sample examples, wherein the sample examples comprise multi-intention interaction text sentences and a plurality of single-intention interaction text sentences which are obtained by analyzing the multi-intention interaction text sentences;
the second recognition subunit is used for recognizing the target type entity of the voice interaction text statement and acquiring a target entity with a preset association relationship with the target type entity; the target type entity comprises a device name class entity and/or a skill name class entity; the preset association relation comprises the same relation or an item execution association relation;
and the searching subunit is used for searching the sample examples comprising the target entity from the sample example library, and determining the searched sample examples as sample examples of the multi-intention recognition task.
In one possible implementation, the second determining subunit includes:
the second input subunit is used for inputting the single-intention interactive text sentence into a sentence standardization GPT model and obtaining a standard single-intention interactive text sentence output by the sentence standardization GPT model; the statement standardized GPT model is applied based on a statement standardized prompt word construction mechanism.
In one possible implementation, the second input subunit includes:
A fourth construction subunit, configured to construct a second prompt word based on the single-intent interaction text sentence, a task description sentence converted into a standard single-intent interaction text sentence in the target domain, a description sentence in the target domain, and a sample example of the task converted into a standard single-intent interaction text sentence in the target domain; sample examples of the task converted into the standard single-intention interaction text sentence in the target field comprise single-intention interaction text sentence examples in the target field and standard single-intention interaction text sentence examples obtained by converting the single-intention interaction text sentence examples;
a third input subunit, configured to input the second prompt word into a sentence normalization GPT model, so that the sentence normalization GPT model parses the single-intent interaction text sentence according to the task description sentence converted into the standard single-intent interaction text sentence of the target domain, the description sentence of the target domain, and the sample example of the task converted into the standard single-intent interaction text sentence of the target domain;
And the fourth acquisition subunit is used for acquiring the standard single-intention interactive text statement obtained by analyzing the statement standardized GPT model.
In one possible implementation manner, the converting the voice interaction text sentence into a standard single-intention interaction text sentence includes: inputting the voice interaction text sentence into a sentence standardization GPT model to obtain a standard single-intention interaction text sentence output by the sentence standardization GPT; the statement standardized GPT model is applied based on a statement standardized prompt word construction mechanism.
In one possible implementation, the target area includes a device control area or a life skill area.
It should be noted that, for specific implementation of each unit in this embodiment, reference may be made to the related description in the above method embodiment. The division of the units in the embodiment of the application is schematic, only one logic function is divided, and other division modes can be adopted in actual implementation. The functional units in the embodiment of the application can be integrated in one processing unit, or each unit can exist alone physically, or two or more units are integrated in one unit. For example, in the above embodiment, the processing unit and the transmitting unit may be the same unit or may be different units. The integrated units may be implemented in hardware or in software functional units.
From the above description of embodiments, it will be apparent to those skilled in the art that all or part of the steps of the above described example methods may be implemented in software plus necessary general purpose hardware platforms. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.
It should be noted that, in the present description, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the method disclosed in the embodiment, since it corresponds to the system disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the system part.
It should also be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (14)

1. A method for model-based recognition of multiple intent of a sentence, the method comprising:
Judging whether the domain classification result of the voice interaction text sentence obtained by voice recognition of the user voice belongs to the target domain or not;
When the field classification result of the voice interaction text sentence is judged not to belong to the target field, inputting the voice interaction text sentence into a multi-intention recognition GPT model, and obtaining a multi-intention recognition result output by the multi-intention recognition GPT model; the multi-purpose recognition GPT model is trained based on a model supervision fine tuning mode and/or is applied based on a multi-purpose recognition prompt word construction mechanism;
Judging whether the multi-intention recognition result is a plurality of single-intention interactive text sentences or not;
When the multi-intention recognition result is judged to be a plurality of single-intention interactive text sentences, each single-intention interactive text sentence is converted into a corresponding standard single-intention interactive text sentence; wherein each of the standard single-intent interactive text sentences is used for representing one interactive intent in the user speech;
the converting each single-intention interactive text sentence into a corresponding standard single-intention interactive text sentence comprises the following steps:
identifying the domain classification result of each single-intention interactive text sentence;
judging whether the domain classification result of the single-intention interactive text sentence belongs to the target domain or not;
When the field classification result of the single-intention interactive text sentence is judged to belong to the target field, determining the single-intention interactive text sentence as a standard single-intention interactive text sentence;
when the field classification result of the single-intention interactive text sentence is judged not to belong to the target field, determining the single-intention interactive text sentence as a fuzzy single-intention interactive text sentence, and converting the fuzzy single-intention interactive text sentence into a standard single-intention interactive text sentence;
The converting the fuzzy single-intent interaction text sentence into a standard single-intent interaction text sentence comprises:
inputting the single-intention interactive text sentence into a sentence standardization GPT model, and obtaining a standard single-intention interactive text sentence output by the sentence standardization GPT model; the statement standardized GPT model is applied based on a statement standardized prompt word construction mechanism;
When the multi-purpose recognition GPT model is trained based on a model supervision fine tuning mode, before the voice interaction text sentence is input into the multi-purpose recognition GPT model, the method further comprises:
acquiring a training data set comprising training sample data; the training sample data comprises a description sentence of a multi-intention recognition task, an interactive text sentence example and a plurality of single-intention interactive text sentence examples which are obtained by analysis of the interactive text sentence example;
Splicing the description sentences of the multi-intention recognition tasks in the training sample data with the interactive text sentence examples to obtain spliced sentences;
Adding mask marks after the spliced sentences to obtain model input sentences;
inputting the model input sentence into a multi-intention recognition GPT model to obtain a plurality of prediction single-intention interactive text sentences output by the multi-intention recognition GPT model;
training the multi-intent recognition GPT model according to a plurality of the predicted single-intent interaction text sentences and a plurality of the single-intent interaction text sentence examples.
2. The method according to claim 1, wherein the method further comprises:
When the field classification result of the voice interaction text sentence is recognized to belong to the target field, determining the voice interaction text sentence as a standard single-intention interaction text sentence;
And sending the voice interaction text sentence to a dialogue engine so that the dialogue engine performs semantic analysis and/or dialogue management on the voice interaction text sentence.
3. The method according to claim 1, wherein the method further comprises:
when the multi-intention recognition result is judged to be not the single-intention interactive text statement, determining that the voice interactive text statement is a fuzzy single-intention interactive text statement;
and converting the voice interaction text sentence into a standard single-intention interaction text sentence in response to the voice interaction text sentence being a fuzzy single-intention interaction text sentence.
4. The method of claim 1, wherein the inputting the voice interactive text sentence into a multi-intent recognition GPT model, obtaining a multi-intent recognition result output by the multi-intent recognition GPT model, comprises:
Constructing a first prompt word based on the voice interaction text sentence, the description sentence of the multi-intention recognition task and the sample example of the multi-intention recognition task; the sample examples of the multi-intention recognition task comprise interactive text sentence examples and a plurality of single-intention interactive text sentence examples which are obtained by analysis of the interactive text sentence examples;
Inputting the first prompt word into a multi-intention recognition GPT model, so that the multi-intention recognition GPT model analyzes the voice interaction text sentence according to the description sentence of the multi-intention recognition task and the sample example of the multi-intention recognition task;
And acquiring a multi-intention recognition result obtained by analyzing the multi-intention recognition GPT model.
5. The method according to claim 1, wherein the method further comprises:
Acquiring a sample example of a multi-intention recognition task; the sample example of the multi-intention recognition task is used for constructing a first prompt word, and the first prompt word is used for combining a multi-intention recognition GPT model to obtain a plurality of single-intention interaction text sentences of the voice interaction text sentences;
The obtaining a sample example of the multi-intent recognition task includes:
Acquiring a plurality of historical single-intention interaction text sentences, and constructing at least one historical multi-intention interaction text sentence based on the plurality of historical single-intention interaction text sentences; the historical multi-intention interactive text statement and a plurality of historical single-intention interactive text statements obtained by analyzing the historical multi-intention interactive text statement form a sample;
obtaining the similarity between the voice interaction text sentence and the historical multi-purpose interaction text sentence in each sample example, sorting the similarity, and screening at least one target sample example from at least one sample example based on the sorted similarity;
the at least one target sample instance is determined to be a sample instance of a multi-intent recognition task.
6. The method of claim 5, wherein the determining the at least one target sample instance as a sample instance of a multi-intent recognition task comprises:
constructing a sample instance set comprising the at least one target sample instance;
Determining a target historical multi-intention interaction text sentence with highest similarity with the voice interaction text sentence in each target sample example, determining the target sample example to which the target historical multi-intention interaction text sentence belongs as a sample example of a multi-intention recognition task, and taking the target historical multi-intention interaction text sentence as a current spliced sentence;
calculating target average self-information of the current spliced statement; the target average self-information is used for measuring the information quantity of the current spliced statement;
Traversing the rest target sample examples in the sample example set, performing sentence splicing on the current spliced sentence and the historical multi-intention interactive text sentence in the traversed target sample example to obtain a spliced text sentence, calculating average self-information of the spliced text sentence, and performing difference between the average self-information of the spliced text sentence and the target average self-information to obtain average self-information gain corresponding to the traversed target sample example;
Screening the target sample example with the highest average self-information gain from the rest target sample examples, determining the target sample example with the highest average self-information gain as a sample example of a multi-intention recognition task when the highest average self-information gain is larger than a preset average self-information gain threshold, and removing the target sample example with the highest average self-information gain from the sample example set;
And re-determining the text sentence after the current spliced sentence is spliced with the historical multi-intention interactive text sentence in the target sample example with the highest average self-information gain as the current spliced sentence, and re-executing the steps of calculating the target average self-information of the current spliced sentence and the follow-up steps until the average self-information gain is smaller than or equal to the average self-information gain threshold value.
7. The method of claim 6, wherein the target average self-information H (S) is expressed as:
The S is the current spliced sentence, n is the number of the lemmas in the current spliced sentence, x t is the t-th lemmas in the current spliced sentence, I (x t) is the self-information of x t, and x 0,…,xt-1 is the first t lemmas in the current spliced sentence.
8. The method according to claim 1, wherein the method further comprises:
Acquiring a sample example of a multi-intention recognition task; the sample example of the multi-intention recognition task is used for constructing a first prompt word, and the first prompt word is used for combining a multi-intention recognition GPT model to obtain a plurality of single-intention interaction text sentences of the voice interaction text sentences;
The obtaining a sample example of the multi-intent recognition task includes:
Constructing a sample example library; the sample example library comprises a plurality of sample examples, wherein the sample examples comprise multi-intention interaction text sentences and a plurality of single-intention interaction text sentences which are obtained by analyzing the multi-intention interaction text sentences;
Identifying a target type entity of the voice interaction text sentence, and acquiring a target entity with a preset association relationship with the target type entity; the target type entity comprises a device name class entity and/or a skill name class entity; the preset association relation comprises the same relation or an item execution association relation;
and searching sample examples comprising the target entity from the sample example library, and determining the searched sample examples as sample examples of the multi-intention recognition task.
9. The method of claim 1, wherein the inputting the single-intent interaction text sentence into a sentence normalization GPT model to obtain a standard single-intent interaction text sentence output by the sentence normalization GPT model comprises:
Constructing a second prompt word based on the single-intent interactive text sentence, a task description sentence converted into a standard single-intent interactive text sentence in the target field, the description sentence in the target field and the sample example of the task converted into the standard single-intent interactive text sentence in the target field; sample examples of the task converted into the standard single-intention interaction text sentence in the target field comprise single-intention interaction text sentence examples in the target field and standard single-intention interaction text sentence examples obtained by converting the single-intention interaction text sentence examples;
Inputting the second prompt word into a sentence standardization GPT model, so that the sentence standardization GPT model analyzes the single-intention interactive text sentence according to the task description sentence converted into the standard single-intention interactive text sentence of the target field, the description sentence of the target field and the sample example of the task converted into the standard single-intention interactive text sentence of the target field;
and obtaining the standard single-intention interactive text statement obtained by analyzing the statement standardized GPT model.
10. The method of claim 3, wherein said converting the voice interactive text statement into a standard single-intent interactive text statement comprises:
Inputting the voice interaction text sentence into a sentence standardization GPT model to obtain a standard single-intention interaction text sentence output by the sentence standardization GPT; the statement standardized GPT model is applied based on a statement standardized prompt word construction mechanism.
11. The method according to any one of claims 1-10, wherein the target area comprises an equipment control area or a life skills area.
12. A model-based sentence multi-purpose recognition apparatus, the apparatus comprising:
a first judging unit for judging whether the domain classification result of the voice interaction text sentence obtained by voice recognition of the user belongs to the target domain;
The first acquisition unit is used for inputting the voice interaction text sentence into a multi-intention recognition GPT model when the field classification result of the voice interaction text sentence is judged not to belong to the target field, and acquiring a multi-intention recognition result output by the multi-intention recognition GPT model; the multi-purpose recognition GPT model is trained based on a model supervision fine tuning mode and/or is applied based on a multi-purpose recognition prompt word construction mechanism;
a second judging unit for judging whether the multi-intention recognition result is a plurality of single-intention interactive text sentences;
the first conversion unit is used for converting each single-intention interaction text sentence into a corresponding standard single-intention interaction text sentence when the multi-intention recognition result is judged to be a plurality of single-intention interaction text sentences; wherein each of the standard single-intent interactive text sentences is used for representing one interactive intent in the user speech;
The first conversion unit includes:
The first recognition subunit is used for recognizing the domain classification result of each single-intention interactive text sentence;
a judging subunit, configured to judge whether a domain classification result of the single-intent interactive text sentence belongs to the target domain;
A first determining subunit, configured to determine, when it is determined that a domain classification result of the single-intent interaction text sentence belongs to the target domain, that the single-intent interaction text sentence is a standard single-intent interaction text sentence;
the second determining subunit is used for determining that the single-intention interactive text sentence is a fuzzy single-intention interactive text sentence when the field classification result of the single-intention interactive text sentence is judged not to belong to the target field, and converting the fuzzy single-intention interactive text sentence into a standard single-intention interactive text sentence;
The second determining subunit includes:
the second input subunit is used for inputting the single-intention interactive text sentence into a sentence standardization GPT model and obtaining a standard single-intention interactive text sentence output by the sentence standardization GPT model; the statement standardized GPT model is applied based on a statement standardized prompt word construction mechanism;
The apparatus further comprises:
The second obtaining unit is used for obtaining a training data set comprising training sample data before inputting the voice interaction text sentence into the multi-intention recognition GPT model when the multi-intention recognition GPT model is obtained based on model supervision fine tuning training; the training sample data comprises a description sentence of a multi-intention recognition task, an interactive text sentence example and a plurality of single-intention interactive text sentence examples which are obtained by analysis of the interactive text sentence example;
The splicing unit is used for splicing the description sentences of the multi-intention recognition tasks in the training sample data with the interactive text sentence examples to obtain spliced sentences;
an adding unit, configured to add mask marks to the spliced sentences to obtain model input sentences;
The input unit is used for inputting the model input sentence into a multi-intention recognition GPT model to obtain a plurality of prediction single-intention interactive text sentences output by the multi-intention recognition GPT model;
And the training unit is used for training the multi-intention recognition GPT model according to a plurality of the predicted single-intention interaction text sentences and a plurality of the single-intention interaction text sentence examples.
13. An electronic device, comprising:
One or more processors;
A storage device having one or more programs stored thereon,
The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the model-based multi-intent recognition method as recited in any one of claims 1-11.
14. A computer readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the model-based statement multi-intent recognition method as claimed in any one of claims 1 to 11.
CN202410232119.5A 2024-03-01 2024-03-01 Statement multi-intention recognition method, device and equipment based on model Active CN117807215B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410232119.5A CN117807215B (en) 2024-03-01 2024-03-01 Statement multi-intention recognition method, device and equipment based on model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410232119.5A CN117807215B (en) 2024-03-01 2024-03-01 Statement multi-intention recognition method, device and equipment based on model

Publications (2)

Publication Number Publication Date
CN117807215A CN117807215A (en) 2024-04-02
CN117807215B true CN117807215B (en) 2024-05-24

Family

ID=90433785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410232119.5A Active CN117807215B (en) 2024-03-01 2024-03-01 Statement multi-intention recognition method, device and equipment based on model

Country Status (1)

Country Link
CN (1) CN117807215B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209791A (en) * 2019-06-12 2019-09-06 百融云创科技股份有限公司 It is a kind of to take turns dialogue intelligent speech interactive system and device more
CN112800206A (en) * 2021-03-24 2021-05-14 南京万得资讯科技有限公司 Crank call shielding method based on generative multi-round conversation intention recognition
CN114116975A (en) * 2021-11-19 2022-03-01 百融至信(北京)征信有限公司 Multi-intention identification method and system
CN114860938A (en) * 2022-05-17 2022-08-05 上海弘玑信息技术有限公司 Statement intention identification method and electronic equipment
CN115186094A (en) * 2022-07-21 2022-10-14 平安科技(深圳)有限公司 Multi-intention classification model training method and device, electronic equipment and storage medium
WO2022227211A1 (en) * 2021-04-30 2022-11-03 平安科技(深圳)有限公司 Bert-based multi-intention recognition method for discourse, and device and readable storage medium
CN116229955A (en) * 2023-05-09 2023-06-06 海尔优家智能科技(北京)有限公司 Interactive intention information determining method based on generated pre-training GPT model
CN116756277A (en) * 2023-04-20 2023-09-15 海尔优家智能科技(北京)有限公司 Processing method of interactive statement based on target generation type pre-training GPT model
CN117524215A (en) * 2023-09-26 2024-02-06 镁佳(北京)科技有限公司 Voice intention recognition method, device, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220165257A1 (en) * 2020-11-20 2022-05-26 Soundhound, Inc. Neural sentence generator for virtual assistants

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209791A (en) * 2019-06-12 2019-09-06 百融云创科技股份有限公司 It is a kind of to take turns dialogue intelligent speech interactive system and device more
CN112800206A (en) * 2021-03-24 2021-05-14 南京万得资讯科技有限公司 Crank call shielding method based on generative multi-round conversation intention recognition
WO2022227211A1 (en) * 2021-04-30 2022-11-03 平安科技(深圳)有限公司 Bert-based multi-intention recognition method for discourse, and device and readable storage medium
CN114116975A (en) * 2021-11-19 2022-03-01 百融至信(北京)征信有限公司 Multi-intention identification method and system
CN114860938A (en) * 2022-05-17 2022-08-05 上海弘玑信息技术有限公司 Statement intention identification method and electronic equipment
CN115186094A (en) * 2022-07-21 2022-10-14 平安科技(深圳)有限公司 Multi-intention classification model training method and device, electronic equipment and storage medium
CN116756277A (en) * 2023-04-20 2023-09-15 海尔优家智能科技(北京)有限公司 Processing method of interactive statement based on target generation type pre-training GPT model
CN116229955A (en) * 2023-05-09 2023-06-06 海尔优家智能科技(北京)有限公司 Interactive intention information determining method based on generated pre-training GPT model
CN117524215A (en) * 2023-09-26 2024-02-06 镁佳(北京)科技有限公司 Voice intention recognition method, device, computer equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
周奇安 ; 李舟军 ; .基于BERT的任务导向对话***自然语言理解的改进模型与调优方法.中文信息学报.2020,(第05期),全文. *
杨春妮 ; 冯朝胜 ; .结合句法特征和卷积神经网络的多意图识别模型.计算机应用.2018,(第07期),全文. *
黄毅 ; 冯俊兰 ; 胡珉 ; 吴晓婷 ; 杜晓宇 ; .智能对话***架构及算法.北京邮电大学学报.2019,(第06期),全文. *

Also Published As

Publication number Publication date
CN117807215A (en) 2024-04-02

Similar Documents

Publication Publication Date Title
Kreyssig et al. Neural user simulation for corpus-based policy optimisation for spoken dialogue systems
US11132509B1 (en) Utilization of natural language understanding (NLU) models
Cuayáhuitl et al. Evaluation of a hierarchical reinforcement learning spoken dialogue system
US11200885B1 (en) Goal-oriented dialog system
CN114691852B (en) Man-machine conversation system and method
US11295743B1 (en) Speech processing for multiple inputs
US11132994B1 (en) Multi-domain dialog state tracking
US11398226B1 (en) Complex natural language processing
CN110807333A (en) Semantic processing method and device of semantic understanding model and storage medium
WO2023168838A1 (en) Sentence text recognition method and apparatus, and storage medium and electronic apparatus
Duong et al. An adaptable task-oriented dialog system for stand-alone embedded devices
Mishakova et al. Learning natural language understanding systems from unaligned labels for voice command in smart homes
Jeon et al. Language model adaptation based on topic probability of latent dirichlet allocation
US20240185846A1 (en) Multi-session context
Mathur et al. The rapidly changing landscape of conversational agents
CN116994565B (en) Intelligent voice assistant and voice control method thereof
CN114399995A (en) Method, device and equipment for training voice model and computer readable storage medium
CN110851650A (en) Comment output method and device and computer storage medium
Wang et al. Data augmentation for internet of things dialog system
Hakkani-Tür et al. A weakly-supervised approach for discovering new user intents from search query logs
CN117807215B (en) Statement multi-intention recognition method, device and equipment based on model
CN116913274A (en) Scene generation method, device and storage medium based on generation type large model
WO2023173596A1 (en) Statement text intention recognition method and apparatus, storage medium, and electronic apparatus
CN112150103B (en) Schedule setting method, schedule setting device and storage medium
Kalkhoran et al. Detecting Persian speaker-independent voice commands based on LSTM and ontology in communicating with the smart home appliances

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant