WO2023071581A1 - 用于确定响应语句的方法、设备、装置和介质 - Google Patents

用于确定响应语句的方法、设备、装置和介质 Download PDF

Info

Publication number
WO2023071581A1
WO2023071581A1 PCT/CN2022/118787 CN2022118787W WO2023071581A1 WO 2023071581 A1 WO2023071581 A1 WO 2023071581A1 CN 2022118787 W CN2022118787 W CN 2022118787W WO 2023071581 A1 WO2023071581 A1 WO 2023071581A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
sentence
source
dialogue
loss
Prior art date
Application number
PCT/CN2022/118787
Other languages
English (en)
French (fr)
Inventor
徐爽
黄浩然
李航
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023071581A1 publication Critical patent/WO2023071581A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • Exemplary embodiments of the present disclosure generally relate to the field of computers, and in particular, to methods, devices, devices, and computer-readable storage media for determining response sentences.
  • Natural language processing can be applied to many different dialogue processing systems and applications.
  • dialogue processing techniques can be applied to intelligent dialogue systems that can interact with users to assist users in performing specific tasks.
  • models such as language understanding models to determine accurate response sentences.
  • the accuracy of the determined response sentences affects the accuracy of dialogue processing tasks. Therefore, it is expected that the model used to determine the response sentence has good processing power and accuracy.
  • a scheme for determining a response sentence is provided.
  • a method in a first aspect of the present disclosure, includes determining, based on the source sentence sequence and the candidate response sentence, a matching score between the candidate response sentence and the correct response sentence corresponding to the source sentence sequence according to a response determination model. The method also includes determining a first loss based on the matching score and annotation information of the candidate response sentence. The first loss represents the degree of similarity between the matching score and the annotation information. The annotation information indicates whether the candidate response sentence is a correct response sentence. The method also includes determining a second loss based on the source dialogue sequence, the positive example dialogue sequence and the negative example dialogue sequence. The source dialogue sequence includes the source sentence sequence and candidate response sentences. A positive dialogue sequence contains at least correct response sentences.
  • Negative sample dialogue sequences include source sentence sequences and error response sentences for the source sentence sequences.
  • the second loss represents the similarity between the positive sample dialogue sequence and the source dialogue sequence compared with the negative sample dialogue sequence.
  • the method also includes training a response determination model based on the values of the first loss and the second loss.
  • an electronic device in a second aspect of the present disclosure, includes at least one processing unit; and at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit.
  • the instructions when executed by at least one processing unit, cause the device to perform the following actions: based on the sequence of source sentences and the candidate response sentences, according to the response determination model, determining a matching score between the candidate response sentences and the correct response sentences corresponding to the source sentence sequences; The matching score and the labeling information of the candidate response sentences are used to determine the first loss.
  • the first loss indicates the similarity between the matching score and the labeling information, and the labeling information indicates whether the candidate response sentences are correct response sentences; based on the source dialogue sequence and positive sample dialogue sequence and negative sample dialogue sequence to determine the second loss, the source dialogue sequence includes the source sentence sequence and the candidate response sentence, the positive sample dialogue sequence includes at least the correct response sentence, and the negative sample dialogue sequence includes the source sentence sequence and the error response to the source sentence sequence sentence, the second loss represents the similarity between the positive sample dialogue sequence and the source dialogue sequence compared with the negative sample dialogue sequence; and based on the values of the first loss and the second loss, a response determination model is trained.
  • a device for determining a response sentence includes a matching score determination module configured to determine the candidate response sentence and the Matching scores between correct response sentences corresponding to source sentence sequences.
  • the apparatus also includes a first loss determination module configured to determine the first loss based on the matching score and the annotation information of the candidate response sentence. The first loss represents the degree of similarity between the matching score and the annotation information.
  • the annotation information indicates whether the candidate response sentence is a correct response sentence.
  • the apparatus also includes a second loss determination module configured to determine the second loss based on the source dialogue sequence, the positive sample dialogue sequence and the negative sample dialogue sequence.
  • the source dialogue sequence includes the source sentence sequence and candidate response sentences. A positive dialogue sequence contains at least correct response sentences.
  • the negative sample dialogue sequence includes the source sentence sequence and the error response sentence for the source sentence sequence.
  • the second loss represents the similarity between the positive sample dialogue sequence and the source dialogue sequence compared with the negative sample dialogue sequence.
  • the apparatus also includes a model training module configured to train a response determination model based on values of the first loss and the second loss.
  • a computer readable storage medium is provided.
  • a computer program is stored on the medium, and when the program is executed by the processor, the method in the first aspect is realized.
  • Figure 1 shows a schematic diagram of an example environment in which embodiments of the present disclosure can be implemented
  • Figure 2 shows a schematic diagram of a model training architecture according to some embodiments of the present disclosure
  • Fig. 3 shows a schematic diagram of a dialogue sequence of positive and negative samples according to some embodiments of the present disclosure
  • FIG. 4 shows a flowchart of a process for determining a response statement according to some embodiments of the present disclosure
  • FIG. 5 shows a block diagram of an apparatus for determining a response sentence according to some embodiments of the present disclosure.
  • Figure 6 shows a block diagram of a device capable of implementing various embodiments of the present disclosure.
  • model can learn the relationship between the corresponding input and output from the training data, so that the corresponding output can be generated for the given input after the training is completed.
  • the generation of the model may be based on machine learning techniques.
  • Deep learning is a machine learning algorithm that uses multiple layers of processing units to process input and provide corresponding output.
  • a neural network model is an example of a deep learning based model.
  • a “model” may also be referred to herein as a "machine learning model,” “learning model,” “machine learning network,” or “learning network,” and these terms are used interchangeably herein.
  • “parameters of a determined model” or similar expressions refer to the values of parameters of a determined model (also referred to as parameter values), including specific values, value sets, or value ranges.
  • a “neural network” is a machine learning network based on deep learning.
  • a neural network is capable of processing input and providing a corresponding output, which generally includes an input layer and an output layer and one or more hidden layers between the input layer and the output layer.
  • Neural networks used in deep learning applications typically include many hidden layers, increasing the depth of the network.
  • the layers of the neural network are connected in sequence so that the output of the previous layer is provided as the input of the subsequent layer, where the input layer receives the input of the neural network, and the output of the output layer serves as the final output of the neural network.
  • Each layer of a neural network consists of one or more nodes (also known as processing nodes or neurons), each of which processes input from the previous layer.
  • machine learning can roughly include three phases, namely training phase, testing phase and application phase (also known as inference phase).
  • training phase a given model can be trained using a large amount of training data, and the parameter values of the model are updated iteratively until the model can obtain consistent inferences that meet the expected goals from the training data.
  • a model can be thought of as being able to learn associations from inputs to outputs (also known as input-to-output mappings) from the training data.
  • the parameter values of the trained model are determined.
  • test input is applied to the trained model to test whether the model can provide the correct output, thereby determining the performance of the model.
  • the model can be used to process the actual input and determine the corresponding output based on the parameter values obtained by training.
  • the training phase can in turn include pre-training and fine-tuning.
  • Pre-training refers to training the model for common tasks, that is, iteratively updating the parameter values of the model.
  • the pre-trained models have a wide range of applications and can be applied to many different downstream tasks.
  • Fine-tuning refers to training a pre-trained model on the specific downstream task to which the model will be applied. The fine-tuned model is more suitable for specific downstream tasks.
  • the model is usually required to determine target response sentences for multiple rounds of historical dialogues in a set of candidate dialogue statements. How to improve the accuracy of the model to determine the target response sentence is a problem worthy of attention.
  • Some conventional schemes have been proposed to perform dialogue determination as a matching task. For example, some schemes use such as Transformer-based Bidirectional Encoder (BERT) to determine the matching score of candidate dialogues.
  • Some methods have been proposed to train BERT models based on loss functions such as cross-entropy to determine accurate response sentences.
  • the training method of this cross-entropy-based loss function is sensitive to noisy data (such as noisy labels or noisy annotation information). Mislabeled noise data in the training data set will affect the training results. Models trained by cross-entropy-based loss functions, such as the BERT model, cannot obtain satisfactory results.
  • a solution for determining a response sentence is provided, aiming to solve one or more of the above-mentioned problems and other potential problems.
  • the matching score between the candidate response sentences and the correct response sentences is determined.
  • a first loss is determined based on the matching score and annotation information indicating whether the candidate response sentence is a correct response sentence.
  • the second loss is determined based on a source dialogue sequence consisting of a source sentence sequence and candidate response sentences, a positive sample dialogue sequence including correct response sentences, and a negative sample dialogue sequence including wrong response sentences. Based on the values of the first loss and the second loss, parameters of the response determination model are adjusted.
  • the accuracy of the matching score obtained by the response determination model is improved by optimizing the first loss.
  • the response determination model can better learn the discriminative degree of correct samples from wrong samples by optimizing the second loss. In this way, noise in the data can be better removed. Therefore, this is a supervised learning scheme based on contrastive learning. Using this supervised learning based on contrastive learning, accurate response sentence determination results can be obtained.
  • FIG. 1 shows a schematic diagram of an example environment 100 in which embodiments of the present disclosure can be implemented.
  • the environment 100 of FIG. 1 it is desirable to train and apply a response determination model 120 to determine accurate response sentences.
  • the response sentence refers to the next answer sentence after one or more rounds of dialogue sentences have occurred.
  • a conversation may be between two or more users. Additionally or alternatively, conversations may also take place between one or more users and one or more devices, such as intelligent voice assistants.
  • response determination model 120 is configured to generate match score 130 for candidate response sentence 104 based on source sentence sequence 102 and candidate response sentence 104 .
  • Match score 130 represents the degree of match between candidate response sentence 104 and the correct response sentence corresponding to source sentence sequence 102 .
  • Source statement sequence 102 may include multiple statements.
  • the source sentence sequence 102 may be one or more rounds of dialogue that have occurred before.
  • a correct response sentence represents the correct or authentic next sentence of dialogue immediately following the source sentence sequence 102 .
  • the sentences in source sentence sequence 102 may be natural language input received from a user or other system. Sentences are also referred to as utterances in this disclosure. Statements can be in various formats, such as spoken, textual, etc. In this disclosure, textual sentences will be used for the purpose of illustration, but this disclosure is not limited thereto, and this disclosure will cover any form of sentences.
  • environment 100 includes computing system 110 .
  • computing system 110 is configured to utilize source sentence sequences 102, candidate response sentences 104, and additional other training data (not shown) to train response determination.
  • Model 120 thereby determining the parameters of the response determination model 120.
  • response determination model 120 may be a pre-trained model.
  • Computing system 110 is configured to fine-tune response determination model 120 using source sentence sequence 102 , candidate response sentences 104 , and additional other training data, thereby updating and adjusting parameters of response determination model 120 .
  • N is an integer greater than or equal to 1.
  • Computing system 110 also trains or fine-tunes response determination model 120 using additional other training data, as previously described. The process of training or fine-tuning the response determination model 120 will be described in more detail below with reference to FIGS. 2 and 3 .
  • computing system 110 may utilize response determination model 120 to determine match scores 130 for candidate response statements 104 .
  • the candidate response sentence with the highest matching score can be determined from the plurality of candidate response sentences as the response sentence corresponding to the source sentence sequence.
  • accurate response sentences can be obtained.
  • the computing system 110 may be any system with computing capabilities, such as various computing devices/systems, terminal devices, servers, and so on.
  • Terminal equipment can be any type of mobile terminal, fixed terminal or portable terminal, including mobile phones, desktop computers, laptop computers, notebook computers, netbook computers, tablet computers, media computers, multimedia tablets, or any combination of the foregoing , including accessories and peripherals for these devices, or any combination thereof.
  • Servers include, but are not limited to, mainframes, edge computing nodes, computing devices in cloud environments, and the like.
  • FIG. 1 the components and arrangement of the environment shown in FIG. 1 are examples only, and that computing systems suitable for implementing example embodiments described in this disclosure may include one or more different components, other components, and/or different arrangement. Embodiments of the present disclosure are not limited in this respect.
  • FIG. 2 shows an example of a model training architecture 200 according to some embodiments of the present disclosure.
  • Architecture 200 of FIG. 2 may be implemented in computing system 110 of FIG. 1 or in other suitable systems or devices.
  • Each module/component in the architecture 200 may be implemented by hardware, software, firmware or any combination thereof.
  • source dialog sequence 250 may be composed of source sentence 102 and candidate response sentence 104 .
  • the annotation information 204 in FIG. 2 is associated with the candidate response sentence 104 .
  • the annotation information 204 indicates whether the candidate response sentence 104 is a correct response sentence corresponding to the source sentence 102 . For example, if the label information 204 is 0, it may indicate that the candidate response sentence 104 is a wrong response sentence, and if the label information 204 is 1, it may indicate that the candidate response sentence 104 is a correct response sentence. It should be understood that the label information 204 may also use other methods (for example, the letter "T" or "F") or other numerical values to indicate whether the candidate response sentence 104 is a correct response sentence.
  • only candidate response sentences 104 whose labeling information is 1 are used as training data. That is, only candidate response sentences 104 that are correct response sentences are selected as training data. Additionally or alternatively, both correct and incorrect candidate response sentences may also be selected as training data.
  • response determination model 120 generates match score 130 for candidate response sentence 104 based on source sentence sequence 102 and candidate response sentence 104 .
  • computing system 110 may determine a first loss according to first loss function 210 based on matching score 130 and annotation information 204 .
  • the first loss represents the degree of similarity between the matching score 130 and the annotation information 204 . That is, the first loss may indicate whether the match score 130 is accurate. For example, a larger value of the first loss indicates that the matching score 130 is less similar to the annotation information 204, that is, the matching score 130 is less accurate.
  • the first loss function 210 may be, for example, a cross-entropy function. It should be understood that the first loss function 210 may also adopt other appropriate loss functions, such as a quadratic cost function, and the embodiments of the present disclosure are not limited in this respect.
  • the positive sample dialogue sequence 240 and the negative sample dialogue sequence 260 based on the matching score 130 corresponding to the source sentence sequence 102, the annotation information 204, and the positive sample dialogue sequence 240 and the negative sample dialogue sequence 260, according to the first loss function 210 , to determine the first loss.
  • the positive sample dialogue sequence 240 and the negative sample dialogue sequence will be described in detail below.
  • FIG. 2 also shows that the second loss is determined according to the second loss function 220 based on the source dialogue sequence 250 , the positive sample dialogue sequence 240 and the negative sample dialogue sequence 260 .
  • the source dialog sequence 250 includes the source sentence sequence 102 and the candidate response sentence 104 .
  • the positive sample dialogue sequence 240 includes at least correct response sentences, while the negative sample dialogue sequence 260 includes the source sentence sequence 102 and the error response sentences for the source sentence sequence 102 .
  • the second loss represents the degree of similarity between the positive sample dialogue sequence 240 and the source dialogue sequence 250 compared to the negative sample dialogue sequence 260 .
  • a higher value of the second loss indicates that the similarity between the positive sample dialogue sequence 240 and the source dialogue sequence 250 is lower than that of the negative sample dialogue sequence 260 .
  • a higher second loss value it also indicates that the response determination model 120 has not learned the distinguishing features between positive and negative sample dialogue sequences well.
  • the positive sample dialog sequence 240 can be generated through a data augmentation operation.
  • a positive sample dialogue sequence can be determined based on the source sentence sequence 102 and the correct response sentence.
  • Fig. 3 shows a schematic diagram of a dialogue sequence of positive and negative samples according to some embodiments of the present disclosure.
  • the dialog sequence 300 represents the correct dialog sequence obtained from the training dataset.
  • a and B in FIG. 3 respectively represent two parties in a conversation, for example, two users or a user and a conversation device.
  • the statement sequence 305 includes the statement sequence 305 of the previously conducted dialogue between A and B and the response statement 310 following the statement sequence 305 . It should be understood that the sequence of statements 305 shown in FIG. 3 is merely illustrative, and that in some embodiments, the sequence of statements 305 may involve users other than A and B. Statement sequence 305 may also include fewer or more statements.
  • dialogue sequence 300 may be used as source dialogue sequence 250 .
  • statement sequence 305 may be used as source statement sequence 102 and response statement 310 may be used as candidate response statement 104 .
  • the annotation information 204 associated with the candidate response statement 104 may indicate that the candidate response statement 104 is a correct response statement (eg, the annotation information 204 may be 1).
  • Negative sample dialog sequence 320 includes sentence sequence 305 and error response sentence 330 .
  • the error response sentence 330 may be a different sentence than the correct response sentence 310, optionally selected from the training data set.
  • FIG. 3 also shows a positive sample dialog sequence 340 generated through the data augmentation operation.
  • the order of the two sentences in the sentence sequence 305 is exchanged. That is, the order of the sentence sequence 345 and the sentence sequence 305 in which the order was swapped, and the order of the sentence 350 and the sentence 355 are swapped.
  • the statement in the swapped order is not the last statement in the sequence of statements 305 .
  • FIG. 3 also shows a positive sample dialogue sequence 360 generated through another data augmentation operation.
  • the order of words in a certain sentence in the sentence sequence 305 is changed. That is, the order of the words in sentence 370 has been changed in sentence sequence 365 in which the order of words has been changed compared to sentence sequence 305 .
  • the statement with the changed word order is not the last statement in the sequence of statements 305 .
  • the two examples of data augmentation operations described above are only illustrative, and other data augmentation operations may also be used to obtain positive sample dialogue sequences. Additionally or alternatively, the two example data augmentation operations described above may be combined. For example, by changing the order of some sentences in the sentence sequence 305, and at the same time changing the order of words in one or some sentences in the sentence sequence 305, a positive sample dialogue sequence can be obtained.
  • dialog sequence may be in any appropriate language, for example, a Chinese dialog sentence sequence may also be used.
  • the computing system 110 may determine the second loss according to the second loss function 220 based on the source dialogue sequence 250 , the positive example dialogue sequence 240 and the negative example dialogue sequence 260 .
  • computing system 110 may transmit the determined value of the first loss and the determined value of the second loss to response determination model 120 . In this way, the computing system 110 can use the value of the first loss and the value of the second loss to train the response determination model 120 through backpropagation.
  • the computing system 110 may preset the first loss threshold and the second loss threshold. Computing system 110 finishes training response determination model 120 when it is determined that the value of the first loss does not exceed the first loss threshold and the value of the second loss does not exceed the second loss threshold. In this way, the training process of the response determination model 120 can be simplified. In addition, by setting the first loss threshold and the second loss threshold in advance, it can be ensured that the trained response determination model 120 has higher accuracy.
  • the computing system 110 may also obtain the total loss 230 based on a linear combination of the first loss and the second loss. For example, computing system 110 may add the first loss and the second loss to determine total loss 230 . The value of the total loss 230 may reflect the performance of the response determination model 120 . A smaller value of the total loss 230 indicates a more accurate result of the response determination model 120 . Computing system 110 may transmit total loss 230 to response determination model 120 , and response determination model 120 is trained by optimizing total loss 230 .
  • the computing system 110 may also change the respective weights of the first loss and the second loss in the total loss 230 to obtain the total loss 230 .
  • the respective weights of the first loss and the second loss in the total loss 230 are both 1.
  • the weight of the first loss or the second loss can be changed, for example, the weight of the first loss can be set higher than the second loss or lower than the second loss.
  • computing system 110 may adjust parameters of response determination model 120 through, for example, backpropagation. For example, computing system 110 may optimize parameters of response determination model 120 by minimizing total loss 230 . Additionally or alternatively, in some embodiments, the computing system 110 may also preset a total loss threshold. Computing system 110 finishes training response determination model 120 when total loss 230 is determined to be below the total loss threshold. In this way, the training process of the response determination model 120 can be simplified. In addition, the accuracy of the response determination model 120 thus trained can be guaranteed through the preset total loss threshold.
  • response determination model 120 may include a pre-trained language model.
  • the response determination model 120 may include a pre-trained BERT model or other suitable neural network models.
  • the computing system 110 may determine the source dialogue token sequence, the positive dialogue token sequence and the negative dialogue token sequence respectively based on the source dialogue sequence 250, the positive dialogue sequence 240 and the negative dialogue sequence 260 according to the pre-trained language model.
  • a dialogue token sequence is also known as an implicit representation of a dialogue.
  • the computing system 110 may also perform post-training on the pre-trained language model before it is applied.
  • sample data (c i , r i , y i ) can be randomly selected from the training data set, where represents the source sentence sequence, l i represents the number of sentences in ci , u i represents the sentences in ci , ri represents the candidate response sentence, and y i ⁇ ⁇ 0, 1 ⁇ represents the annotation information of ri .
  • a data augmentation operation may be performed on the sample context dialogue data d i .
  • sentences are sampled for a next sentence prediction (NSP) task.
  • NSP next sentence prediction
  • a positive sample is generated with the last sentence of d i as the response sentence and the other sentences of d i as the context before the response sentence.
  • negative samples d′′ i are constructed with two thirds of wrong response sentences and context sentences selected from the corpus, and one third of sentences selected from the same context.
  • a pre-trained language model is post-trained for tasks such as masked language model (MLM) based on the positive and negative samples described above. MLM randomly masks some tokens from the input, and then predicts the masked tokens based on their contextual information.
  • the response determination model 120 may include a language model post-trained in a manner such as described above.
  • computing system 110 may represent source dialog sequence 250 as follows:
  • i indicates that the source dialogue sequence 250 is the i-th source dialogue sequence
  • x i indicates the source dialogue sequence 250
  • ri represents the candidate response sentence 104.
  • the start element “[CLS]” of the source dialogue sequence 250 is a sentence classification identifier, which represents the classification of the dialogue sequence.
  • the element "[SEP]” is a dialog separation identifier, which represents the separation between the dialog sequence and the response sentence.
  • the element "[EOT]” is an end-of-statement marker, which is used to separate different sentences between dialogue sequences.
  • the source dialogue sequence 250 expressed as formula (1) can be input to the pre-trained language model to obtain the corresponding source dialogue token sequence.
  • the dialogue token sequence is also referred to as a dialogue embedding (Embedding) sequence in this paper.
  • An example source dialogue token sequence can be represented as follows:
  • E [CLS] represents the classification mark corresponding to the start element "[CLS]" of the source dialogue sequence 250
  • E [SEP ] represents the mark corresponding to the end element "[SEP]" of the source dialogue sequence 250
  • other E m, n, 1 , Em , n, 2 , . . . , Em , n, k represent tokens corresponding to other elements in the source dialog sequence 250 .
  • a data augmentation operation can also be used to also obtain the data augmented classification label E′ [CLS] .
  • different masks can be performed on the source dialogue sequence 250, and the masked source dialogue sequence 250 can be input into the trained language model to obtain the classification label E′ [CLS] obtained through data augmentation.
  • the data augmentation operation more training data can be obtained, so that the model can be trained better.
  • the E [CLS ] output by the trained language model, ie the classification label can be used as the representative label of the source dialogue sequence 250 .
  • Computing system 110 may determine match score 130 according to response determination model 120 based on the classification labels of source dialog sequence 250 .
  • response determination model 120 may include a pre-trained classifier.
  • response determination model 120 may include a pre-trained two-layer neural network classifier.
  • a pre-trained classifier may determine a match score 130 corresponding to a source dialogue sequence 250 (ie, a candidate response sentence 104) by, for example, the following:
  • W o , W h , b o and b h represent adjustable parameters of the classifier.
  • ⁇ 1 is an activation function, such as the GELU activation function.
  • ⁇ 2 is the sigmoid function.
  • F(c m , r m ) represents a matching score of 130.
  • match score 130 may be a value between 0 and 1 . For example, the closer the matching score 130 is to 1, the more likely the candidate response sentence 104 is a correct response sentence. That is, the closer the matching score 130 is to 1, the closer the candidate response sentence 104 is to the correct response sentence.
  • classifiers described above are only illustrative and not restrictive. Other suitable classifiers may be employed and other suitable activation functions may be used.
  • the parameters of the classifier such as W o , W h , b o and b h , can be tuned and optimized by optimizing the total loss 230 . An example determination method for total loss 230 will be described next.
  • the total loss 230 may be determined by a linear combination of the first loss and the second loss, e.g., summing the first loss and the second loss.
  • the first loss may be determined by using a first loss function 210 such as cross-entropy.
  • the second loss may be determined by using the second loss function 220 .
  • the source dialogue label sequence and the classification label and the augmented classification label of the source dialogue sequence 250 can be determined.
  • the computing system 110 can respectively determine the positive sample dialogue sequence and the negative sample dialogue sequence based on the positive sample dialogue sequence 240 and the negative sample dialogue sequence 260 according to the pre-trained language model.
  • positive sample classification labels and negative sample classification labels corresponding to the positive sample dialogue sequence 240 and the negative sample dialogue sequence 260 may also be determined respectively.
  • the second loss may be determined based on the degree of similarity between the source dialogue token sequence and the positive dialogue token sequence and the similarity between the source dialogue token sequence and the negative dialogue token sequence. For example, the second loss may be determined according to the second loss function 220 based on the source dialogue classification label, the positive sample classification label and the negative sample classification label.
  • the following describes an example of determining the second loss according to the example second loss function 220 based on the source dialogue sequence set and the corresponding positive sample dialogue sequence set and negative sample dialogue sequence set:
  • H, H + and H - represent the source dialog classification label set, the positive sample classification label set and the negative sample classification label set, respectively.
  • H, H + and H - correspond to the source dialogue sequence set, the positive sample dialogue sequence set and the negative sample dialogue sequence set, respectively.
  • H, H + and H ⁇ each include N classification labels. N can be any natural number.
  • f() represents the similarity function.
  • a cosine similarity function may be used as the similarity function.
  • represents a preset temperature parameter. ⁇ can be a value between 0 and 1. The temperature parameter ⁇ can be adjusted according to the actual application training data set.
  • Z(H, H + , H ⁇ ) can be used to represent the similarity between the source dialogue classification labels in the source dialogue classification label set and other classification labels in the positive sample classification label set and the negative sample classification label set.
  • Z(H, H + , H ⁇ ) can be calculated using
  • second loss function 220 An example of the second loss function 220 is described above. In some embodiments, other second loss functions may also be used. Another example second loss function 220 is shown below.
  • the second loss function in (6) uses Z′′(H, H + , H ⁇ ) different from (4).
  • Z"(H, H + , H ⁇ ) can be calculated as follows:
  • Z′′(H, H + , H - ) increases where ⁇ is the penalty coefficient.
  • the penalty coefficient ⁇ can be a value between 0 and 1.
  • the total loss 230 may be determined in the following manner:
  • represents the weight coefficient.
  • the value of the weight coefficient ⁇ may be preset, for example, may be a value between 0 and 1.
  • the value of the weight coefficient ⁇ can also be adjusted according to the training data set or actual application. By using the weight coefficient ⁇ , the proportions of the first loss and the second loss in the total loss 230 can be adjusted, thereby better adapting to different training data sets or different practical applications.
  • parameters of the response determination model 120 can be adjusted and optimized.
  • parameters of classifiers such as W o , W h , b o and b h , etc. can be tuned and optimized.
  • the parameters of the pre-trained language model can also be adjusted.
  • the computing system 110 may also preset a total loss threshold.
  • Computing system 110 finishes training response determination model 120 when it is determined that the value of total loss 230 is below the total loss threshold. In this way, the training process of the response determination model 120 can be simplified. Furthermore, the accuracy of the response determination model 120 thus trained can be guaranteed by presetting the total loss threshold. It should be understood that the total loss threshold can be adjusted according to different training data sets, different response determination models, and different tasks.
  • the response determination model described above includes a pre-trained model
  • the response determination model may include a non-pretrained model.
  • This non-pretrained model may have initial parameters.
  • a non-pretrained response determination model can be trained using the training procedure described with reference to FIG. 2 to tune the parameters of the model.
  • the determination of the response sentence can be realized through the method based on contrastive learning.
  • a model trained or fine-tuned in this way is better able to learn the differential features between positive and negative sample conversations. In this way, the influence of noise data on the model can be removed, and more accurate response sentences can be determined.
  • model training or fine-tuning process of this solution can also be applied to other natural language processing tasks other than determining the response sentence, such as speech understanding, intent classification, and so on.
  • Models obtained using the training process described herein can be applied to dialogue systems such as digital assistants, chatbots, automated systems, and other such systems.
  • Table 1 shows the comparison of the accuracy of the response determination model in the response sentence determination task obtained by using the training method of this solution based on different training data sets and the accuracy of the conventional model.
  • the model trained using this scheme has better accuracy on various training data sets than the model of the conventional scheme.
  • a better-performing response dialogue determination model can be obtained.
  • the dialog system using the response dialog determination model of this solution can have better performance.
  • FIG. 4 shows a flowchart of a process 400 for determining a response statement according to some embodiments of the present disclosure.
  • Process 400 may be implemented at computing system 110 .
  • Process 400 may also be implemented at other suitable computing systems or computing devices.
  • the match score 130 between the candidate response sentence 104 and the correct response sentence corresponding to the source sentence sequence 102 is determined according to the response determination model 120 .
  • a first loss is determined based on the matching score 130 and the annotation information of the candidate response sentence 104 .
  • the first loss represents the degree of similarity between the matching score 130 and the annotation information.
  • the annotation information indicates whether the candidate response sentence 104 is a correct response sentence. For example, label information of 1 indicates that the candidate response sentence 104 is a correct response sentence, and label information of 0 indicates that the candidate response sentence 104 is an incorrect response sentence.
  • a second loss is determined based on the source dialogue sequence, the positive example dialogue sequence, and the negative example dialogue sequence.
  • the source dialog sequence includes a source sentence sequence 102 and a candidate response sentence 104 .
  • a positive dialogue sequence contains at least correct response sentences.
  • a negative sample dialogue sequence includes a source sentence sequence 102 and an error response sentence for the source sentence sequence 102 .
  • the second loss represents the similarity between the positive sample dialogue sequence and the source dialogue sequence compared with the negative sample dialogue sequence.
  • a positive sample dialog sequence may be determined based on the source sentence sequence 102 and the correct response sentence. For example, the order of the first statement and the second statement in the source statement sequence 102 may be swapped. The first statement and the second statement are not the last statements in the source statement sequence 102 .
  • the positive dialogue sequence can also be determined based on the source sentence sequence 102 with the sentence order exchanged and the correct response sentence.
  • the order of the words in the third sentence in the sequence of source sentences 102 may be changed.
  • the third statement is not the last statement in the sequence of source statements 102 .
  • Positive dialogue sequences can also be determined based on the source sentence sequence 102 with the order of words changed and the correct response sentences.
  • response determination model 120 may include a pre-trained language model.
  • the source dialogue token sequence, positive sample dialogue token sequence and negative sample dialogue token sequence can be respectively determined according to the pre-trained language model.
  • the second loss may also be determined based on the degree of similarity between the source dialogue token sequence and the positive sample dialogue token sequence and the similarity between the source dialogue token sequence and the negative sample dialogue token sequence.
  • response determination model 120 is trained based on the values of the first loss and the second loss.
  • response determination model 120 may include a pre-trained language model.
  • response determination model 120 may also include a pre-trained classifier.
  • a source dialogue classification label may be determined according to a pre-trained language model. It is also possible to determine a match score based on a pre-trained classifier based on source dialogue classification tokens.
  • the respective parameters of the pre-trained classifier and the pre-trained language model can be adjusted by minimizing the sum of the first loss and the second loss to obtain the trained response determination model 120 .
  • Fig. 5 shows a block diagram of an apparatus 500 for determining a response sentence according to some embodiments of the present disclosure.
  • Apparatus 500 may be implemented as or included in computing system 110, for example.
  • Each module/component in the device 500 may be implemented by hardware, software, firmware or any combination thereof.
  • the apparatus 500 includes a matching score determination module 510, configured to determine the candidate response sentence 104 and the correct response sentence corresponding to the source sentence sequence 102 according to the response determination model 120 based on the source sentence sequence 102 and the candidate response sentence 104. Matches between score 130.
  • the apparatus 500 further includes a first loss determining module 520 configured to determine the first loss based on the matching score 130 and the annotation information of the candidate response sentence 104 .
  • the first loss represents the degree of similarity between the matching score 130 and the annotation information.
  • the annotation information indicates whether the candidate response sentence 104 is a correct response sentence.
  • the apparatus 500 further includes a second loss determination module 530 configured to determine a second loss based on the source dialogue sequence, the positive sample dialogue sequence and the negative sample dialogue sequence.
  • the source dialog sequence includes a source sentence sequence 102 and a candidate response sentence 104 .
  • a positive dialogue sequence contains at least correct response sentences.
  • a negative sample dialogue sequence includes a source sentence sequence 102 and an error response sentence for the source sentence sequence 102 .
  • the second loss represents the similarity between the positive sample dialogue sequence and the source dialogue sequence compared with the negative sample dialogue sequence.
  • a positive sample dialogue sequence determination module may also be included, configured to determine a positive sample dialogue sequence based on the source sentence sequence 102 and the correct response sentence.
  • the positive sample dialogue sequence determination module may be configured to exchange the order of the first sentence and the second sentence in the source sentence sequence 102 .
  • the first statement and the second statement are not the last statements in the source statement sequence 102 .
  • the positive dialogue sequence determining module may also be configured to determine the positive dialogue sequence based on the source sentence sequence 102 with the exchanged sentence order and the correct response sentence.
  • the positive sample dialogue sequence determination module may be configured to change the order of words in the third sentence in the source sentence sequence 102 .
  • the third statement is not the last statement in the sequence of source statements 102 .
  • the positive sample dialogue sequence determination module may also be configured to determine the positive sample dialogue sequence based on the source sentence sequence 102 with the word order changed and the correct response sentence.
  • response determination model 120 may include a pre-trained language model.
  • the second loss determination module 530 may be configured to determine the source dialogue token sequence, the positive dialogue token sequence and the negative dialogue token sequence respectively according to the pre-trained language model based on the source dialogue sequence, positive sample dialogue sequence and negative sample dialogue sequence .
  • the second loss determination module 530 can also be configured to determine the second loss based on the similarity between the source dialogue token sequence and the positive sample dialogue token sequence and the similarity between the source dialogue token sequence and the negative sample dialogue token sequence.
  • the apparatus 500 further includes a model training module 540 configured to train the response determination model 120 based on the first loss and the second loss.
  • response determination model 120 may include a pre-trained language model. In some embodiments, response determination model 120 may also include a pre-trained classifier. In some embodiments, the matching score determination module 510 may be configured to determine source dialogue classification marks based on the source sentence sequence 102 and the candidate response sentences 104 according to a pre-trained language model. The matching score determination module 510 may also be configured to determine a matching score based on a pre-trained classifier based on the source dialogue classification markers. Additionally or alternatively, in some embodiments, the model training module 540 can be configured to adjust the respective parameters of the pre-trained classifier and the pre-trained language model by minimizing the sum of the first loss and the second loss , to obtain the trained response determination model 120 .
  • FIG. 6 shows a block diagram illustrating a computing device 600 in which one or more embodiments of the present disclosure may be implemented. It should be understood that the computing device 600 shown in FIG. 6 is exemplary only and should not constitute any limitation on the functionality and scope of the embodiments described herein. The computing device 600 shown in FIG. 6 may be used to implement the computing system 110 of FIG. 1 .
  • computing device 600 is in the form of a general-purpose computing device.
  • Components of computing device 600 may include, but are not limited to, one or more processors or processing units 610, memory 620, storage devices 630, one or more communication units 640, one or more input devices 650, and one or more output devices 660.
  • the processing unit 610 may be an actual or virtual processor and is capable of performing various processes according to programs stored in the memory 620 .
  • multiple processing units execute computer-executable instructions in parallel to increase the parallel processing capability of the computing device 600 .
  • Computing device 600 typically includes a plurality of computer storage media. Such media can be any available media that is accessible by computing device 600 , including but not limited to, volatile and nonvolatile media, removable and non-removable media.
  • Memory 620 can be volatile memory (eg, registers, cache, random access memory (RAM)), nonvolatile memory (eg, read only memory (ROM), electrically erasable programmable read only memory (EEPROM) , flash memory) or some combination of them.
  • Storage device 630 may be removable or non-removable media, and may include machine-readable media, such as flash drives, magnetic disks, or any other media that may be capable of storing information and/or data (e.g., training data for training ) and can be accessed within computing device 600.
  • Computing device 600 may further include additional removable/non-removable, volatile/nonvolatile storage media.
  • a disk drive for reading from or writing to a removable, nonvolatile disk such as a "floppy disk"
  • a disk drive for reading from a removable, nonvolatile disk may be provided.
  • CD-ROM drive for reading or writing.
  • each drive may be connected to the bus (not shown) by one or more data media interfaces.
  • Memory 620 may include a computer program product 625 having one or more program modules configured to perform the various methods or actions of the various embodiments of the present disclosure.
  • the communication unit 640 enables communication with other computing devices through the communication medium. Additionally, the functionality of the components of computing device 600 may be implemented in a single computing cluster or as a plurality of computing machines capable of communicating via communication links. Accordingly, computing device 600 may operate in a networked environment using logical connections to one or more other servers, a network personal computer (PC), or another network node.
  • PC network personal computer
  • Input device 650 may be one or more input devices, such as a mouse, keyboard, trackball, and the like.
  • Output device 660 may be one or more output devices, such as a display, speakers, printer, or the like.
  • the computing device 600 can also communicate with one or more external devices (not shown) through the communication unit 640 as needed, such as storage devices, display devices, etc., and one or more devices that enable the user to interact with the computing device 600 In communication, or with any device (eg, network card, modem, etc.) that enables computing device 600 to communicate with one or more other computing devices. Such communication may be performed via an input/output (I/O) interface (not shown).
  • I/O input/output
  • a computer-readable storage medium on which computer-executable instructions are stored, wherein the computer-executable instructions are executed by a processor to implement the methods described above.
  • a computer program product tangibly stored on a non-transitory computer-readable medium and comprising computer-executable instructions, and the computer-executable instructions are executed by a processor to implement the method described above.
  • These computer-readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processing unit of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
  • computer-readable program instructions can be loaded onto a computer, other programmable data processing apparatus, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process, Instructions executed on computers, other programmable data processing devices, or other devices can thus implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
  • each block in a flowchart or block diagram may represent a module, a program segment, or a portion of an instruction that contains one or more executable instruction.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

根据本公开的实施例,提供了一种用于确定响应语句的方法、设备、装置和介质。该方法包括基于源语句序列和候选响应语句,根据响应确定模型,确定候选响应语句与正确响应语句间的匹配得分。基于匹配得分和表示候选响应语句是否是正确响应语句的标注信息来确定第一损失。基于源对话序列、正样本对话序列和负样本对话序列来确定第二损失。源对话序列包括源语句序列和候选响应语句。正样本对话序列至少包括正确响应语句。负样本对话序列包括源语句序列和针对源语句序列的错误响应语句。第二损失表示相比负样本对话序列,正样本对话序列与源对话序列之间的相似程度。该方法还包括基于第一损失和第二损失的值,训练响应确定模型。

Description

用于确定响应语句的方法、设备、装置和介质
相关申请的交叉引用
本申请要求申请号为202111275265.9,题为“用于确定响应语句的方法、设备、装置和介质”、申请日为2021年10月29日的中国发明专利申请的优先权,通过引用方式将该申请整体并入本文。
技术领域
本公开的示例实施例总体涉及计算机领域,特别地涉及用于确定响应语句的方法、设备、装置和计算机可读存储介质。
背景技术
在自然语言处理(NLP)领域中,基于人工智能的各种对话处理技术已经得到显著发展,并且具有广泛应用。自然语言处理可以应用于多种不同的对话处理***和应用。例如,对话处理技术可以应用于能够与用户进行交互的智能对话***,以协助用户执行具体的任务。针对这些对话处理***,通常需要利用诸如语言理解模型等的模型来确定准确的响应语句。所确定的响应语句的准确性影响着对话处理任务的准确性。因此,期望用于确定响应语句的模型具有良好的处理能力和准确性。
发明内容
根据本公开的示例实施例,提供了一种用于确定响应语句的方案。
在本公开的第一方面,提供了一种的方法。该方法包括基于源语句序列和候选响应语句,根据响应确定模型,确定候选响应语句与对应于源语句序列的正确响应语句之间的匹配得分。该方法还包括基于匹配得分和候选响应语句的标注信息来确定第一损失。第一损失表示匹配得分与标注信息之间的相似程度。标注信息表示候选响应语句是否是正确响应语句。该方法还包括基于源对话序列、正样本对话序列 和负样本对话序列来确定第二损失。源对话序列包括源语句序列和候选响应语句。正样本对话序列至少包括正确响应语句。负样本对话序列包括源语句序列和针对源语句序列的错误响应语句。第二损失表示相比负样本对话序列,正样本对话序列与源对话序列之间的相似程度。该方法还包括基于第一损失和第二损失的值,训练响应确定模型。
在本公开的第二方面,提供了一种电子设备。该设备包括至少一个处理单元;以及至少一个存储器,至少一个存储器被耦合到至少一个处理单元并且存储用于由至少一个处理单元执行的指令。指令在由至少一个处理单元执行时使设备执行以下动作:基于源语句序列和候选响应语句,根据响应确定模型,确定候选响应语句与对应于源语句序列的正确响应语句之间的匹配得分;基于匹配得分和候选响应语句的标注信息来确定第一损失,第一损失表示匹配得分与标注信息之间的相似程度,标注信息表示候选响应语句是否是正确响应语句;基于源对话序列、正样本对话序列和负样本对话序列来确定第二损失,源对话序列包括源语句序列和候选响应语句,正样本对话序列至少包括正确响应语句,负样本对话序列包括源语句序列和针对源语句序列的错误响应语句,第二损失表示相比负样本对话序列,正样本对话序列与源对话序列之间的相似程度;以及基于第一损失和第二损失的值,训练响应确定模型。
在本公开的第三方面,提供了一种用于确定响应语句的装置,该装置包括匹配得分确定模块,被配置为基于源语句序列和候选响应语句,根据响应确定模型,确定候选响应语句与对应于源语句序列的正确响应语句之间的匹配得分。该装置还包括第一损失确定模块,被配置为基于匹配得分和候选响应语句的标注信息来确定第一损失。第一损失表示匹配得分与标注信息之间的相似程度。标注信息表示候选响应语句是否是正确响应语句。该装置还包括第二损失确定模块,被配置为基于源对话序列、正样本对话序列和负样本对话序列来确定第二损失。源对话序列包括源语句序列和候选响应语句。正样本对话序列至少包括正确响应语句。负样本对话序列包括源语句序列和针对源语 句序列的错误响应语句。第二损失表示相比负样本对话序列,正样本对话序列与源对话序列之间的相似程度。该装置还包括模型训练模块,被配置为基于第一损失和第二损失的值,来训练响应确定模型。
在本公开的第四方面,提供了一种计算机可读存储介质。介质上存储有计算机程序,程序被处理器执行时实现第一方面的方法。应当理解,本发明内容部分中所描述的内容并非旨在限定本公开的实施例的关键特征或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的描述而变得容易理解。
附图说明
结合附图并参考以下详细说明,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。在附图中,相同或相似的附图标记表示相同或相似的元素,其中:
图1示出了本公开的实施例能够在其中实现的示例环境的示意图;
图2示出了根据本公开的一些实施例的模型训练架构的示意图;
图3示出了根据本公开的一些实施例的正负样本对话序列的示意图;
图4示出了根据本公开的一些实施例的用于确定响应语句的过程的流程图;
图5示出了根据本公开的一些实施例的用于确定响应语句的装置的框图;以及
图6示出了能够实施本公开的多个实施例的设备的框图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中示出了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反,提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护 范围。
在本公开的实施例的描述中,术语“包括”及其类似用语应当理解为开放性包含,即“包括但不限于”。术语“基于”应当理解为“至少部分地基于”。术语“一个实施例”或“该实施例”应当理解为“至少一个实施例”。术语“一些实施例”应当理解为“至少一些实施例”。下文还可能包括其他明确的和隐含的定义。
如本文中所使用的,术语“模型”可以从训练数据中学习到相应的输入与输出之间的关联,从而在训练完成后可以针对给定的输入,生成对应的输出。模型的生成可以基于机器学习技术。深度学习是一种机器学习算法,通过使用多层处理单元来处理输入和提供相应输出。神经网络模型是基于深度学习的模型的一个示例。在本文中,“模型”也可以被称为“机器学习模型”、“学习模型”、“机器学习网络”或“学习网络”,这些术语在本文中可互换地使用。如本文所使用的,属于“确定模型的参数”或类似表达是指确定模型的参数的值(又称为参数值),包括具体值、取值集合或取值范围等。
“神经网络”是一种基于深度学习的机器学习网络。神经网络能够处理输入并且提供相应输出,其通常包括输入层和输出层以及在输入层与输出层之间的一个或多个隐藏层。在深度学习应用中使用的神经网络通常包括许多隐藏层,从而增加网络的深度。神经网络的各个层按顺序相连,从而前一层的输出被提供作为后一层的输入,其中输入层接收神经网络的输入,而输出层的输出作为神经网络的最终输出。神经网络的每个层包括一个或多个节点(也称为处理节点或神经元),每个节点处理来自上一层的输入。
通常,机器学习大致可以包括三个阶段,即训练阶段、测试阶段和应用阶段(也称为推理阶段)。在训练阶段,给定的模型可以使用大量的训练数据进行训练,不断迭代更新模型的参数值,直到模型能够从训练数据中获取一致的满足预期目标的推理。通过训练,模型可以被认为能够从训练数据中学习从输入到输出之间的关联(也称为输入到输出的映射)。训练后的模型的参数值被确定。在测试阶段,将 测试输入应用到训练后的模型,测试模型是否能够提供正确的输出,从而确定模型的性能。在应用阶段,模型可以被用于基于训练得到的参数值,对实际的输入进行处理,确定对应的输出。
在一些机器学习方案中,训练阶段又可以包括预训练和微调。预训练是指针对通用任务来训练模型,即迭代更新模型的参数值。经预训练的模型具有广泛的应用范围,可应用于多种不同的下游任务。微调是指针对将要应用模型的具体下游任务来训练经预训练的模型。微调后的模型更适于处理具体下游任务。
如前文提及的,在自然语言处理领域中,针对各种对话处理***和应用,通常需要利用语言理解模型来确定对应于多轮历史对话的准确的响应语句。例如,在一些应用中,通常需要模型在候选对话语句集合中,确定出针对多轮历史对话的目标响应语句。如何提高模型确定目标响应语句的准确性,是值得关注的问题。
已经提出了一些常规方案来将对话确定作为匹配任务来执行。例如,一些方案使用诸如基于变换器(Transformer)的双向编码器(BERT)来确定候选对话的匹配得分。目前已经提出了一些基于诸如交叉熵的损失函数来训练BERT模型以确定出准确的响应语句的方法。然而,这种基于交叉熵的损失函数的训练方式对于噪声数据(诸如噪声标签或噪声标注信息)敏感。训练数据集中的错误标注的噪声数据会对训练结果造成影响。由基于交叉熵的损失函数所训练出的模型,例如BERT模型无法得到令人满意的结果。
在计算机视觉(CV)领域中,存在一些有监督学习方式以用于模型的训练。例如,基于对比学习的有监督学习方式取得了很好的效果。与之相比,在自然语言处理领域中,这种基于对比学习的有监督学习方式尚未得到应用。因此,在自然语言处理领域中,缺少利用基于对比学习来确定响应语句的有效方案。
根据本公开的实施例,提供了一种用于确定响应语句的方案,旨在解决上述问题以及其他潜在问题中的一个或多个。在该方案中,基于源语句序列和候选响应语句,根据响应确定模型,确定候选响应语 句与正确响应语句之间的匹配得分。基于匹配得分和表示该候选响应语句是否是正确响应语句的标注信息来确定第一损失。基于由源语句序列和候选响应语句组成的源对话序列、包括正确响应语句的正样本对话序列和包括错误响应语句的负样本对话序列来确定第二损失。基于第一损失和第二损失的值,调整响应确定模型的参数。
在该方案中,通过优化第一损失来提高响应确定模型所得到的匹配得分的准确性。此外,通过优化第二损失来使响应确定模型更好地学习正确的样本与错误的样本之前的区分度。以此方式,能够更好地去除数据中的噪声。因此,这是一种基于对比学习的有监督的学习方案。使用这种基于对比学习的有监督的学习学习,能够得到准确的响应语句确定结果。
示例环境
图1示出了本公开的实施例能够在其中实现的示例环境100的示意图。在图1的环境100中,期望训练和应用响应确定模型120来确定出准确的响应语句。在本文中,响应语句指的是在已发生的一轮或多轮对话语句之后的接下来的应答语句。在一些实施例中,对话可以是在两个或多个用户之间发生的。附加地或备选地,对话也可以是在一个或多个用户与一个或多个设备,例如智能语音助手之间发生的。
如图1所示,响应确定模型120被配置为基于源语句序列102和候选响应语句104来生成候选响应语句104的匹配得分130。匹配得分130表示候选响应语句104与对应于源语句序列102的正确响应语句之间的匹配程度。源语句序列102可以包括多个语句。在本文中,源语句序列102可以是之前已发生的一轮或多轮对话。正确响应语句表示紧接着源语句序列102的正确的或真实的下一句对话。
在一些实施例中,源语句序列102中的语句可以是从用户或其他***接收的自然语言输入。在本公开中语句也被称为话语。语句可以具有多种格式,例如口头的、文本的等。在本公开中,文本语句将用于说明目的,但本公开不限于此,并且本公开将涵盖任何形式的语句。
总体而言,环境100包括计算***110。在图1的示例实施例以及下文将会描述的一些示例实施例中,计算***110被配置为利用源语句序列102、候选响应语句104以及附加的其他训练数据(未示出)来训练响应确定模型120,从而确定响应确定模型120的参数。在一些实施例中,响应确定模型120可以是经过预训练的模型。计算***110被配置为利用源语句序列102、候选响应语句104以及附加的其他训练数据来微调响应确定模型120,从而更新和调整响应确定模型120的参数。
应当理解,虽然图1中仅示出了一个源语句序列102和一个候选响应语句104,但这仅仅是示意性的,计算***110可以利用N个源语句序列以及N个候选响应语句以及其他N个附加的其他训练数据,其中N为大于等于1的整数。
如前所述,计算***110还使用附加的其他训练数据来训练或微调响应确定模型120。将在下文中结合图2和图3更详细的描述对响应确定模型120的训练或微调过程。
在对响应确定模型120训练或微调之后,计算***110可以利用响应确定模型120来确定候选响应语句104的匹配得分130。以这样,可以从多个候选响应语句中将匹配得分最高的候选响应语句确定为对应于源语句序列的响应语句。通过使用经训练或微调的响应语句确定模型,能够得到准确的响应语句。
在图1中,计算***110可以是任何具有计算能力的***,例如各种计算设备/***、终端设备、服务器等。终端设备可以是任意类型的移动终端、固定终端或便携式终端,包括移动手机、台式计算机、膝上型计算机、笔记本计算机、上网本计算机、平板计算机、媒体计算机、多媒体平板、或者前述各项的任意组合,包括这些设备的配件和外设或者其任意组合。服务器包括但不限于大型机、边缘计算节点、云环境中的计算设备,等等。
应当理解,图1示出的环境中的部件和布置仅是示例,适于用于实现本公开所描述的示例实施例的计算***可以包括一个或多个不 同的部件、其他部件和/或不同的布置方式。本公开的实施例在此方面不受限制。
以下将继续参考附图,描述模型训练或微调的示例实施例。
模型训练架构
图2示出了根据本公开的一些实施例的模型训练架构200的示例。图2的架构200可以被实现在图1的计算***110中或者其他适当的***或设备中。架构200中的各个模块/组件可以由硬件、软件、固件或者它们的任意组合来实现。
在图2中,除了源语句序列102和候选响应语句104外,还示出了用于模型训练的附加的其他训练数据,诸如正样本对话序列240、负样本对话序列260、源对话序列250以及标注信息204。如图2所示,源对话序列250可以由源语句102和候选响应语句104组成。
图2中的标注信息204是与候选响应语句104相关联的。标注信息204表示该候选响应语句104是否是对应于源语句102的正确响应语句。例如,标注信息204为0可以表示该候选响应语句104是错误的响应语句,而标注信息204为1则可以表示该候选响应语句104是正确响应语句。应当理解,标注信息204也可以采用其他方式(例如,字母“T”或“F”)或其他数值来表示该候选响应语句104是否是正确响应语句。
在一些实施例中,仅使用标注信息为1的候选响应语句104作为训练数据。即,仅选择是正确响应语句的候选响应语句104作为训练数据。附加地或备选地,也可以选择正确的或者错误的候选响应语句两者作为训练数据。
如前所述,响应确定模型120基于源语句序列102和候选响应语句104来生成候选响应语句104的匹配得分130。如图2中所示,计算***110可以基于匹配得分130和标注信息204,根据第一损失函数210来确定第一损失。第一损失表示匹配得分130与标注信息204之间的相似程度。即,第一损失可以表示匹配得分130是否准确。例 如,第一损失的值越大,表示匹配得分130与标注信息204之间越不相似,即匹配得分130越不准确。
在一些实施例中,第一损失函数210可以是,例如交叉熵函数。应当理解,第一损失函数210也可以采用其他的适当的损失函数,例如二次代价函数等,本公开的实施例在此方面不受限制。
附加地或备选地,在一些实施例中,还可以基于对应于源语句序列102的匹配得分130、标注信息204,以及正样本对话序列240和负样本对话序列260,根据第一损失函数210,来确定第一损失。将在下文中详细描述正样本对话序列240和负样本对话序列。
图2中还示出了基于源对话序列250、正样本对话序列240和负样本对话序列260,根据第二损失函数220,来确定第二损失。如前所述,源对话序列250包括源语句序列102和候选响应语句104。正样本对话序列240至少包括正确响应语句,而负样本对话序列260包括源语句序列102和针对源语句序列102的错误响应语句。第二损失表示相比负样本对话序列260,正样本对话序列240与源对话序列250之间的相似程度。例如,第二损失的值越高,则表明相比负样本对话序列260,正样本对话序列240与源对话序列250之间的相似程度越低。在这种第二损失值较高的情况下,也表明响应确定模型120没有很好地学习到正负样本对话序列之间的区别特征。
在一些实施例中,正样本对话序列240可以通过数据增广操作而生成。例如,可以基于源语句序列102和正确响应语句来确定正样本对话序列。图3示出了根据本公开的一些实施例的正负样本对话序列的示意图。
如图3所示,对话序列300表示从训练数据集中获取的正确的对话序列。图3中的A与B分别表示进行对话的双方,例如两个用户或者一个用户与一个对话设备。如图3所示,语句序列305包括A与B之间的之前进行的对话的语句序列305和紧接着语句序列305的响应语句310。应当理解,图3中所示出的语句序列305仅仅是示意性的,在一些实施例中,语句序列305可以涉及除A与B之外的其他 用户。语句序列305还可以包括更少的或者更多的语句。
在一些实施例中,可以将对话序列300用作源对话序列250。例如,可以将语句序列305用作源语句序列102,并且将响应语句310用作候选响应语句104。在这一示例中,与候选响应语句104相关联的标注信息204可以表示该候选响应语句104是正确响应语句(例如,标注信息204可以为1)。
图3中还示出了负样本对话序列320。负样本对话序列320包括语句序列305以及错误响应语句330。错误响应语句330可以是从训练数据集中任选的与正确响应语句310不同的语句。
图3中还示出了通过数据增广操作而生成的正样本对话序列340。在正样本对话序列340中,语句序列305中的两个语句被交换了顺序。即,经交换顺序的语句序列345与语句序列305相比,语句350与语句355的顺序被交换了。应当理解,虽然图3的示例中,仅示出了其中两个语句的顺序发生了交换,但也可以将语句序列305中的多于两个语句的语句交换顺序。在一些实施例中,经交换顺序的语句不是语句序列305中的最后语句。如图3所示,将经交换顺序的语句序列345与正确响应语句310组合,可以获得正样本对话序列340。
图3中还示出了通过另一数据增广操作而生成的正样本对话序列360。在正样本对话序列360中,语句序列305中的某个语句中的词语被改变了顺序。即,经改变词语顺序的语句序列365与语句序列305相比,语句370中的词语的顺序被改变了。应当理解,虽然图3的示例中,仅示出了其中一个语句中词语的顺序发生了交换,但也可以将语句序列305中的两个或更多的语句中的词语顺序进行改变。在一些实施例中,经改变词语顺序的语句不是语句序列305中的最后语句。如图3所示,将经改变词语顺序的语句序列365与正确响应语句310组合,可以获得正样本对话序列360。
应当理解,以上所描述的两种数据增广操作的示例仅仅是示意性的,还可以使用其他数据增广操作来获取正样本对话序列。附加地或备选地,以上所描述的两种示例数据增广操作可以相结合。例如,可 以通过改变语句序列305中的某些语句的顺序,并且同时改变语句序列305中的某个或某些语句中词语的顺序,以获得正样本对话序列。
应当理解,虽然在图3的示例中,以英语语句为例示出了各个对话序列。在一些实施例中,对话序列可以采用任意适当的语言,例如还可以使用中文的对话语句序列。
通过使用诸如上述描述的数据增广操作来获取正样本对话序列,能够获得更多的正样本对话序列以更好地对模型进行训练或微调。通过使用正样本对话序列以及负样本对话序列,能够使模型更好地学习正负样本之间的区别,从而降低噪声对模型的影响。
以上结合图3描述了示例正负样本对话序列。继续参考图2,计算***110可以基于源对话序列250、正样本对话序列240和负样本对话序列260,根据第二损失函数220,来确定第二损失。在一些实施例中,计算***110可以将所确定的第一损失的值以及所确定的第二损失的值传输给响应确定模型120。通过这样,计算***110可以利用第一损失的值和第二损失的值,通过反向传播的方式,训练响应确定模型120。
在一些实施例中,计算***110可以预先设置第一损失阈值和第二损失阈值。当确定第一损失的值不超过第一损失阈值并且第二损失的值不超过第二损失阈值时,计算***110完成对响应确定模型120的训练。以这种方式,可以简化响应确定模型120的训练过程。此外,通过预先设置的第一损失阈值和第二损失阈值,可以确保训练的响应确定模型120具有较高的准确度。
附加地或备选地,在一些实施例中,计算***110还可以基于第一损失与第二损失的线性组合,来获得总损失230。例如,计算***110可以将第一损失与第二损失相加而确定出总损失230。总损失230的值可以反映出响应确定模型120的性能表现。总损失230的值越小,表示响应确定模型120的结果越准确。计算***110可以将总损失230传输给响应确定模型120,通过优化总损失230来训练响应确定模型120。
在一些实施例中,计算***110还可以改变第一损失与第二损失在总损失230中各自的权重,以获得总损失230。例如,在以上所描述的两者相加的示例中,第一损失与第二损失在总损失230中各自的权重均为1。在一些实施例中,可以改变第一损失或第二损失的权重,例如将第一损失的权重设置为高于第二损失、或者低于第二损失。通过动态调整第一损失与第二损失的权重,可以更好地适应不同的训练数据集以及适应不同的响应确定模型。
根据总损失230的值,计算***110可以通过例如反向传播来调整响应确定模型120的参数。例如,计算***110可以通过最小化总损失230,来优化响应确定模型120的参数。附加地或备选地,在一些实施例中,计算***110还可以预先设置总损失阈值。当确定总损失230低于总损失阈值时,计算***110完成对响应确定模型120的训练。以这种方式,可以简化响应确定模型120的训练过程。此外,通过预先设置的总损失阈值,由此所训练的响应确定模型120的准确度可以得到保证。
在一些实施例中,响应确定模型120可以包括预训练的语言模型。例如,响应确定模型120可以包括预训练的BERT模型或者其他适当的神经网络模型。例如,计算***110可以基于源对话序列250、正样本对话序列240和负样本对话序列260,根据预训练的语言模型,分别确定源对话标记序列、正样本对话标记序列和负样本对话标记序列。对话标记序列也被称为对话的隐式表示。
在一些实施例中,预训练的语言模型在被应用之前,计算***110还可以对其进行后训练。例如,可以从训练数据集中任选样本数据(c i,r i,y i),其中
Figure PCTCN2022118787-appb-000001
表示源语句序列,l i表示c i中语句的数目,u i表示c i中的语句,r i表示候选响应语句,y i∈{0,1}表示r i的标注信息。根据样本数据(c i,r i,y i)可以得到样本上下文对话数据d i=c i∪{r i}。
在一些实施例中,可以对样本上下文对话数据d i进行数据增广操作。例如,可以以50%的可能性将样本上下文对话数据d i截掉一部分, 而获得新的样本上下文对话数据d′ i,d′ i中至少包括两个语句。对于另外的50%可能性,使d′ i=d i
附加地或备选地,针对下一语句预测(NSP)任务,采样语句。例如,以25%的概率,生成以d i的最后语句作为响应语句并且以d i的其他语句作为响应语句之前的上下文的正样本。此外,针对负样本,以三分之二的从语料库中任选的错误的响应语句和上下文语句,以及从相同上下文中选择的三分之一的语句来构建负样本d″ i
在一些实施例中,基于以上所描述的正负样本,针对诸如掩码语言模型(MLM)等任务,对预训练的语言模型进行后训练。MLM从输入中随机掩盖了一些标记,然后根据其上下文信息预测被掩盖的标记。在一些实施例中,响应确定模型120可以包括利用诸如以上所描述的方式经过后训练的语言模型。
以下将详细描述使用预训练或者预训练及后训练的语言模型确定对话标记序列的过程。以源对话序列250为例,计算***110可以将源对话序列250表示为如下:
Figure PCTCN2022118787-appb-000002
其中,i表示源对话序列250是第i个源对话序列,x i表示源对话序列250,
Figure PCTCN2022118787-appb-000003
表示源语句序列102,r i表示候选响应语句104。源对话序列250的开始元素“[CLS]”为语句分类标识,其表示对话序列的分类。元素“[SEP]”为对话分隔标识,其表示对话序列与响应语句之间的分隔。元素“[EOT]”为语句结束标识,其用于分隔对话序列间的不同语句。
在一些实施例中,可以将例如表示为式(1)的源对话序列250输入到预训练的语言模型,以获得与之相对应的源对话标记序列。对话标记序列在本文中也被称为对话嵌入(Embedding)序列。示例源对话标记序列可以表示为如下:
{E [CLS]E m,n,1,E m,n,2,…,E m,n,kE [SEP]}    (2)
其中E [CLS]表示对应于源对话序列250的开始元素“[CLS]”的分类 标记,E [SEP]表示对应于源对话序列250的末尾元素“[SEP]”的标记,其他E m,n,1,E m,n,2,…,E m,n,k表示对应于源对话序列250中的其他元素的标记。
附加地或备选地,在一些实施例中,还可以使用数据增广操作还获得经数据增广的分类标记E′ [CLS]。例如,可以对源对话序列250进行不同的掩码,将经掩码的源对话序列250输入到经训练的语言模型,可以获得经数据增广获得的分类标记E′ [CLS]。通过数据增广操作,可以获得更多的训练数据,从而能够更好地训练模型。
在一些实施例中,可以将由经训练的语言模型所输出的E [CLS],即分类标记作为源对话序列250的代表标记表示。计算***110可以基于源对话序列250的分类标记,根据响应确定模型120,来确定匹配得分130。
在一些实施例中,响应确定模型120可以包括预训练的分类器。例如,响应确定模型120可以包括预训练的两层神经网络分类器。预训练的分类器可以通过例如以下方式来确定对应于源对话序列250(即,候选响应语句104)的匹配得分130:
F(c m,r m)=σ 2(W oσ 1(W hE [CLS]+b h)+b o)     (3)
其中,W o、W h、b o和b h表示分类器的可调整的参数。σ 1是激活函数,例如GELU激活函数。σ 2是sigmoid函数。F(c m,r m)表示匹配得分130。在一些实施例中,匹配得分130可以是介于0与1之间的数值。例如,匹配得分130越接近1越表示候选响应语句104是正确响应语句的可能性越大。即,匹配得分130越接近1越表示候选响应语句104越接近正确响应语句。
应当理解,以上所描述的分类器仅仅是示意性的,而不是限制性的。可以采用其他适当的分类器,也可以使用其他适当的激活函数。分类器的参数,例如W o、W h、b o和b h,可以通过优化总损失230来被调整和优化。接下来将描述总损失230的示例确定方法。
如前所述,可以通过第一损失与第二损失的线性组合,例如将第 一损失与第二损失相加来确定总损失230。第一损失可以通过使用诸如交叉熵的第一损失函数210来确定。第二损失可以通过使用第二损失函数220来确定。以下将详细描述根据第二损失函数220确定第二损失的若干示例。
如前所述,可以基于源对话序列250,根据预训练的语言模型,确定源对话标记序列以及确定源对话序列250的分类标记及增广的分类标记。类似地,计算***110可以基于正样本对话序列240和负样本对话序列260,根据预训练的语言模型,分别确定正样本对话标记序列和负样本对话标记序列。此外,还可以分别确定对应于正样本对话序列240和负样本对话序列260的正样本分类标记和负样本分类标记。
在一些实施例中,可以基于源对话标记序列与正样本对话标记序列之间的相似程度以及源对话标记序列与负样本对话标记序列之间的相似程度,来确定第二损失。例如,可以基于源对话分类标记、正样本分类标记和负样本分类标记,根据第二损失函数220,确定第二损失。
应当理解,虽然图2中仅示出了一个源对话序列250、一个正样本对话序列240,以及一个负样本对话序列260,但是也可以采用源对话序列集合以及相应的正样本对话序列集合和负样本对话序列集合来训练响应确定模型120。
以下描述了基于源对话序列集合以及相应的正样本对话序列集合和负样本对话序列集合,根据示例第二损失函数220确定第二损失的示例:
Figure PCTCN2022118787-appb-000004
其中,
Figure PCTCN2022118787-appb-000005
表示第二损失,H、H +和H -分别表示源对话分类标记集合、正样本分类标记集合和负样本分类标记集合。H、H +和H -分别对应于源对话序列集合、正样本对话序列集合和负样本对话 序列集合。在式(4)所描述的示例中,H、H +和H -各自包括N个分类标记。N可以是任意自然数。h i
Figure PCTCN2022118787-appb-000006
分别表示源对话分类标记集合和正样本分类标记集合中的第i个分类标记。f()表示相似度函数。例如,可以使用余弦相似度函数来作为相似度函数。
Figure PCTCN2022118787-appb-000007
表示源对话分类标记集合和正样本分类标记集合中的第i个分类标记之间的相似度。τ表示预设的温度参数。τ可以是介于0与1之间的数值。温度参数τ可以根据实际应用的训练数据集而被调整。
Z(H,H +,H -)可以用来表示源对话分类标记集合中的源对话分类标记与正样本分类标记集合和负样本分类标记集合中的其他分类标记的相似程度。在一些实施例中,可以使用以下方式来计算Z(H,H +,H -)
Figure PCTCN2022118787-appb-000008
其中
Figure PCTCN2022118787-appb-000009
Figure PCTCN2022118787-appb-000010
分别表示正样本分类标记集合和负样本分类标记集合中的第j个分类标记。j与i不相等。通过使用以上方式所确定的第二损失,能够反映出相比负样本对话序列,正样本对话序列与源对话序列之间的相似程度。
以上描述了第二损失函数220的一个示例。在一些实施例中,还可以使用其他的第二损失函数。以下示出了另一示例第二损失函数220。
Figure PCTCN2022118787-appb-000011
与(4)类似,(6)中的第二损失函数使用了不同于(4)的Z″(H,H +,H -)。
在一些实施例中,Z″(H,H +,H -)可以被计算为如下:
Figure PCTCN2022118787-appb-000012
与(5)相比,Z″(H,H +,H -)增加了
Figure PCTCN2022118787-appb-000013
其中α表示惩罚系数。惩罚系数α可以是介于0与1之间的数值。惩罚系数α可以根据不同的训练数据集而被调整。
Figure PCTCN2022118787-appb-000014
其中仅当i=j时,
Figure PCTCN2022118787-appb-000015
为1,其他情况均为0。
通过使用以上参考(6)与(7)所描述的第二损失函数,能够通过在h i
Figure PCTCN2022118787-appb-000016
之间的相似度
Figure PCTCN2022118787-appb-000017
之上增加惩罚项。通过这样,能够增加负样本对话序列在Z″(H,H +,H -)中的占比。以这样,特别是对于通过使用从质量高的训练数据集中获得的负样本对话序列的情况,能够更加充分的学习到正负样本序列之间的差别特征。
在一些实施例中,在确定了第一损失与第二损失之后,可以通过以下方式来确定总损失230:
Figure PCTCN2022118787-appb-000018
其中,
Figure PCTCN2022118787-appb-000019
表示总损失230,
Figure PCTCN2022118787-appb-000020
表示第一损失,
Figure PCTCN2022118787-appb-000021
表示第二损失,λ表示权重系数。权重系数λ的值可以是预先设置的,例如可以是介于0与1之间的数值。权重系数λ的值也可以根据训练数据集或者实际应用而被调整。通过使用权重系数λ,可以调整第一损失与第二损失在总损失230中的比重,进而能够更好地适应不同的训练数据集或者不同的实际应用。
在一些实施例中,基于上文所描述的损失函数(例如,式(8)),可以进行多次迭代优化,直到迭代收敛或达到预定次数,则模型训练完成。通过这样,可以调整和优化响应确定模型的120的参数。例如,可以调整和优化诸如W o、W h、b o和b h,等的分类器的参数。附加地,还可以调整预训练的语言模型的参数。
附加地或备选地,在一些实施例中,计算***110还可以预先设置总损失阈值。当确定总损失230的值低于总损失阈值时,计算***110完成对响应确定模型120的训练。以这种方式,可以简化响应确定模型120的训练过程。此外,通过预先设置的总损失阈值,由此训 练的响应确定模型120的准确度可以得到保证。应当理解,总损失阈值可以根据不同的训练数据集、不同的响应确定模型以及不同的任务而进行调整。
应当理解,虽然以上所描述的响应确定模型包括经过预训练的模型,但在一些实施例中,响应确定模型可以包括未经预训练的模型。该未经预训练的模型可以具有初始参数。未经预训练的响应确定模型可以使用参考图2所描述的训练过程来被训练,以调整该模型的参数。
通过以上所描述的方式,可以通过基于对比学习的方式实现确定响应语句。以这种方式训练或微调的模型,能够更好的学习正负样本对话之间的差别特征。以这样,能够去除噪声数据对模型的影响,进而能够确定出更加准确的响应语句。
此外,本方案的模型训练或微调过程也可以被应用于除了确定响应语句之外的其他的自然语言处理任务,诸如话语理解、意图分类等等。使用本文所描述的训练过程所得到的模型可以应用于诸如数字助理、聊天机器人、自动化***和其他这样的***等对话***中。
示例结果
表1示出了基于不同的训练数据集、使用本方案的训练方式得到的响应确定模型在响应语句确定任务中的准确性与常规模型的准确性的对比。
表1模型准确度对比
Figure PCTCN2022118787-appb-000022
Figure PCTCN2022118787-appb-000023
如表1所示,使用本方案所训练的模型,与常规方案的模型相比,在各种不同的训练数据集上,均具有更好地准确性。通过使用本方案这种基于对比学习的训练方式,能够得到表现更好的响应对话确定模型。利用本方案的响应对话确定模型的对话***,能够具有更好的表现。
示例过程
图4示出了根据本公开的一些实施例的用于确定响应语句的过程400的流程图。过程400可以被实现在计算***110处。过程400也可以被实现在其他适当的计算***或者计算设备处。
在框410处,基于源语句序列102和候选响应语句104,根据响应确定模型120,确定候选响应语句104与对应于源语句序列102的正确响应语句之间的匹配得分130。在框420处,基于匹配得分130和候选响应语句104的标注信息来确定第一损失。第一损失表示匹配得分130与标注信息之间的相似程度。标注信息表示候选响应语句104是否是正确响应语句。例如,标注信息为1表示该候选响应语句104是正确响应语句,而标注信息为0表示该候选响应语句104是错误的响应语句。
在框430处,基于源对话序列、正样本对话序列和负样本对话序列来确定第二损失。源对话序列包括源语句序列102和候选响应语句104。正样本对话序列至少包括正确响应语句。负样本对话序列包括源语句序列102和针对源语句序列102的错误响应语句。第二损失表示相比负样本对话序列,正样本对话序列与源对话序列之间的相似程度。
在一些实施例中,可以基于源语句序列102和正确响应语句来确 定正样本对话序列。例如,可以交换源语句序列102中的第一语句和第二语句的顺序。第一语句和第二语句不是源语句序列102中的最后语句。还可以基于经交换语句顺序的源语句序列102和正确响应语句,来确定正样本对话序列。
附加地或备选地,可以改变源语句序列102中的第三语句中的词语的顺序。第三语句不是源语句序列102中的最后语句。还可以基于经改变词语顺序的源语句序列102和正确响应语句,来确定正样本对话序列。
在一些实施例中,响应确定模型120可以包括预训练的语言模型。为了确定第二损失,可以基于源对话序列、正样本对话序列和负样本对话序列,根据预训练的语言模型,分别确定源对话标记序列、正样本对话标记序列和负样本对话标记序列。还可以基于源对话标记序列与正样本对话标记序列之间的相似程度以及源对话标记序列与负样本对话标记序列之间的相似程度,来确定第二损失。
在框440处,基于第一损失和第二损失的值,训练响应确定模型120。在一些实施例中,响应确定模型120可以包括预训练的语言模型。在一些实施例中,响应确定模型120还可以包括预训练的分类器。为了确定匹配得分,可以基于源语句序列102和候选响应语句104,根据预训练的语言模型,确定源对话分类标记。还可以基于源对话分类标记,根据预训练的分类器,确定匹配得分。在一些实施例中,可以通过最小化第一损失与第二损失的和,调整预训练的分类器和预训练的语言模型各自的参数,以获得经训练的响应确定模型120。
示例装置和设备
图5示出了根据本公开的一些实施例的用于确定响应语句的装置500的框图。装置500可以被实现为或者被包括在例如计算***110中。装置500中的各个模块/组件可以由硬件、软件、固件或者它们的任意组合来实现。
如图所示,装置500包括匹配得分确定模块510,被配置为基于 源语句序列102和候选响应语句104,根据响应确定模型120,确定候选响应语句104与对应于源语句序列102的正确响应语句之间的匹配得分130。装置500还包括第一损失确定模块520,被配置为基于匹配得分130和候选响应语句104的标注信息来确定第一损失。第一损失表示匹配得分130与标注信息之间的相似程度。标注信息表示候选响应语句104是否是正确响应语句。
装置500还包括第二损失确定模块530,被配置为基于源对话序列、正样本对话序列和负样本对话序列来确定第二损失。源对话序列包括源语句序列102和候选响应语句104。正样本对话序列至少包括正确响应语句。负样本对话序列包括源语句序列102和针对源语句序列102的错误响应语句。第二损失表示相比负样本对话序列,正样本对话序列与源对话序列之间的相似程度。
在一些实施例中,还可以包括正样本对话序列确定模块,被配置为基于源语句序列102和正确响应语句来确定正样本对话序列。例如,在一些实施例中,正样本对话序列确定模块可以被配置为交换源语句序列102中的第一语句和第二语句的顺序。第一语句和第二语句不是源语句序列102中的最后语句。正样本对话序列确定模块还可以被配置为基于经交换语句顺序的源语句序列102和正确响应语句,来确定正样本对话序列。
附加地或备选地,在一些实施例中,正样本对话序列确定模块可以被配置为改变源语句序列102中的第三语句中的词语的顺序。第三语句不是源语句序列102中的最后语句。正样本对话序列确定模块还可以被配置为基于经改变词语顺序的源语句序列102和正确响应语句,来确定正样本对话序列。
在一些实施例中,响应确定模型120可以包括预训练的语言模型。第二损失确定模块530可以被配置为基于源对话序列、正样本对话序列和负样本对话序列,根据预训练的语言模型,分别确定源对话标记序列、正样本对话标记序列和负样本对话标记序列。第二损失确定模块530还可以被配置为基于源对话标记序列与正样本对话标记序列之 间的相似程度以及源对话标记序列与负样本对话标记序列之间的相似程度,来确定第二损失。
装置500还包括模型训练模块540,被配置为基于第一损失和第二损失,训练响应确定模型120。
在一些实施例中,响应确定模型120可以包括预训练的语言模型。在一些实施例中,响应确定模型120还可以包括预训练的分类器。在一些实施例中,匹配得分确定模块510可以被配置为基于源语句序列102和候选响应语句104,根据预训练的语言模型,确定源对话分类标记。匹配得分确定模块510还可以被配置为基于源对话分类标记,根据预训练的分类器,确定匹配得分。附加地或备选地,在一些实施例中,模型训练模块540可以被配置为可以通过最小化第一损失与第二损失的和,调整预训练的分类器和预训练的语言模型各自的参数,以获得经训练的响应确定模型120。
图6示出了示出了其中可以实施本公开的一个或多个实施例的计算设备600的框图。应当理解,图6所示出的计算设备600仅仅是示例性的,而不应当构成对本文所描述的实施例的功能和范围的任何限制。图6所示出的计算设备600可以用于实现图1的计算***110。
如图6所示,计算设备600是通用计算设备的形式。计算设备600的组件可以包括但不限于一个或多个处理器或处理单元610、存储器620、存储设备630、一个或多个通信单元640、一个或多个输入设备650以及一个或多个输出设备660。处理单元610可以是实际或虚拟处理器并且能够根据存储器620中存储的程序来执行各种处理。在多处理器***中,多个处理单元并行执行计算机可执行指令,以提高计算设备600的并行处理能力。
计算设备600通常包括多个计算机存储介质。这样的介质可以是计算设备600可访问的任何可以获得的介质,包括但不限于易失性和非易失性介质、可拆卸和不可拆卸介质。存储器620可以是易失性存储器(例如寄存器、高速缓存、随机访问存储器(RAM))、非易失性存储器(例如,只读存储器(ROM)、电可擦除可编程只读存储器 (EEPROM)、闪存)或它们的某种组合。存储设备630可以是可拆卸或不可拆卸的介质,并且可以包括机器可读介质,诸如闪存驱动、磁盘或者任何其他介质,其可以能够用于存储信息和/或数据(例如用于训练的训练数据)并且可以在计算设备600内被访问。
计算设备600可以进一步包括另外的可拆卸/不可拆卸、易失性/非易失性存储介质。尽管未在图6中示出,可以提供用于从可拆卸、非易失性磁盘(例如“软盘”)进行读取或写入的磁盘驱动和用于从可拆卸、非易失性光盘进行读取或写入的光盘驱动。在这些情况中,每个驱动可以由一个或多个数据介质接口被连接至总线(未示出)。存储器620可以包括计算机程序产品625,其具有一个或多个程序模块,这些程序模块被配置为执行本公开的各种实施例的各种方法或动作。
通信单元640实现通过通信介质与其他计算设备进行通信。附加地,计算设备600的组件的功能可以以单个计算集群或多个计算机器来实现,这些计算机器能够通过通信连接进行通信。因此,计算设备600可以使用与一个或多个其他服务器、网络个人计算机(PC)或者另一个网络节点的逻辑连接来在联网环境中进行操作。
输入设备650可以是一个或多个输入设备,例如鼠标、键盘、追踪球等。输出设备660可以是一个或多个输出设备,例如显示器、扬声器、打印机等。计算设备600还可以根据需要通过通信单元640与一个或多个外部设备(未示出)进行通信,外部设备诸如存储设备、显示设备等,与一个或多个使得用户与计算设备600交互的设备进行通信,或者与使得计算设备600与一个或多个其他计算设备通信的任何设备(例如,网卡、调制解调器等)进行通信。这样的通信可以经由输入/输出(I/O)接口(未示出)来执行。
根据本公开的示例性实现方式,提供了一种计算机可读存储介质,其上存储有计算机可执行指令,其中计算机可执行指令被处理器执行以实现上文描述的方法。根据本公开的示例性实现方式,还提供了一种计算机程序产品,计算机程序产品被有形地存储在非瞬态计算机可 读介质上并且包括计算机可执行指令,而计算机可执行指令被处理器执行以实现上文描述的方法。
这里参照根据本公开实施例的方法、装置、设备和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理单元,从而生产出一种机器,使得这些指令在通过计算机或其他可编程数据处理装置的处理单元执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
可以把计算机可读程序指令加载到计算机、其他可编程数据处理装置、或其他设备上,使得在计算机、其他可编程数据处理装置或其他设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其他可编程数据处理装置、或其他设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本公开的多个实施例的***、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实施例中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功 能或动作的专用的基于硬件的***来实现,或者可以用专用硬件与计算机指令的组合来实现。
以上已经描述了本公开的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所公开的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其他普通技术人员能理解本文公开的各个实现方式。

Claims (18)

  1. 一种用于确定响应语句的方法,包括:
    基于源语句序列和候选响应语句,根据响应确定模型,确定所述候选响应语句与对应于所述源语句序列的正确响应语句之间的匹配得分;
    基于所述匹配得分和所述候选响应语句的标注信息来确定第一损失,所述第一损失表示所述匹配得分与所述标注信息之间的相似程度,所述标注信息表示所述候选响应语句是否是所述正确响应语句;
    基于源对话序列、正样本对话序列和负样本对话序列来确定第二损失,所述源对话序列包括所述源语句序列和所述候选响应语句,所述正样本对话序列至少包括所述正确响应语句,所述负样本对话序列包括所述源语句序列和针对所述源语句序列的错误响应语句,所述第二损失表示相比所述负样本对话序列,所述正样本对话序列与所述源对话序列之间的相似程度;以及
    基于所述第一损失和所述第二损失的值,训练所述响应确定模型。
  2. 根据权利要求1所述的方法,还包括:
    基于所述源语句序列和所述正确响应语句来确定所述正样本对话序列。
  3. 根据权利要求2所述的方法,其中确定所述正样本对话序列包括:
    交换所述源语句序列中的第一语句和第二语句的顺序,所述第一语句和所述第二语句不是所述源语句序列中的最后语句;以及
    基于经交换语句顺序的所述源语句序列和所述正确响应语句,确定所述正样本对话序列。
  4. 根据权利要求2所述的方法,其中确定所述正样本对话序列包括:
    改变所述源语句序列中的第三语句中的词语的顺序,所述第三语句不是所述源语句序列中的最后语句;以及
    基于经改变词语顺序的所述源语句序列和所述正确响应语句,确 定所述正样本对话序列。
  5. 根据权利要求1所述的方法,其中所述响应确定模型包括预训练的语言模型。
  6. 根据权利要求5所述的方法,其中确定所述第二损失包括:
    基于所述源对话序列、所述正样本对话序列和所述负样本对话序列,根据所述预训练的语言模型,分别确定源对话标记序列、正样本对话标记序列和负样本对话标记序列;以及
    基于所述源对话标记序列与所述正样本对话标记序列之间的相似程度以及所述源对话标记序列与所述负样本对话标记序列之间的相似程度,确定所述第二损失。
  7. 根据权利要求5所述的方法,其中所述响应确定模型还包括预训练的分类器;并且
    其中确定所述匹配得分包括:
    基于所述源语句序列和所述候选响应语句,根据所述预训练的语言模型,确定源对话分类标记;以及
    基于所述源对话分类标记,根据所述预训练的分类器,确定所述匹配得分。
  8. 根据权利要求7所述的方法,其中训练所述响应确定模型包括:
    通过最小化所述第一损失与所述第二损失的和,调整所述预训练的分类器和所述预训练的语言模型各自的参数,以获得经训练的所述响应确定模型。
  9. 一种电子设备,包括:
    至少一个处理单元;以及
    至少一个存储器,所述至少一个存储器被耦合到所述至少一个处理单元并且存储用于由所述至少一个处理单元执行的指令,所述指令在由所述至少一个处理单元执行时使所述电子设备执行以下动作:
    基于第一参考图像,根据第一模型,生成所述第一参考图像的第一特征表示;
    基于源语句序列和候选响应语句,根据响应确定模型,确定 所述候选响应语句与对应于所述源语句序列的正确响应语句之间的匹配得分;
    基于所述匹配得分和所述候选响应语句的标注信息来确定第一损失,所述第一损失表示所述匹配得分与所述标注信息之间的相似程度,所述标注信息表示所述候选响应语句是否是所述正确响应语句;
    基于源对话序列、正样本对话序列和负样本对话序列来确定第二损失,所述源对话序列包括所述源语句序列和所述候选响应语句,所述正样本对话序列至少包括所述正确响应语句,所述负样本对话序列包括所述源语句序列和针对所述源语句序列的错误响应语句,所述第二损失表示相比所述负样本对话序列,所述正样本对话序列与所述源对话序列之间的相似程度;以及
    基于所述第一损失和所述第二损失的值,训练所述响应确定模型。
  10. 根据权利要求9所述的电子设备,其中所述动作还包括:
    基于所述源语句序列和所述正确响应语句来确定所述正样本对话序列。
  11. 根据权利要求10所述的电子设备,其中确定所述正样本对话序列包括:
    交换所述源语句序列中的第一语句和第二语句的顺序,所述第一语句和所述第二语句不是所述源语句序列中的最后语句;以及
    基于经交换语句顺序的所述源语句序列和所述正确响应语句,确定所述正样本对话序列。
  12. 根据权利要求10所述的电子设备,其中确定所述正样本对话序列包括:
    改变所述源语句序列中的第三语句中的词语的顺序,所述第三语句不是所述源语句序列中的最后语句;以及
    基于经改变词语顺序的所述源语句序列和所述正确响应语句,确定所述正样本对话序列。
  13. 根据权利要求9所述的电子设备,其中所述响应确定模型包括预训练的语言模型。
  14. 根据权利要求13所述的电子设备,其中确定所述第二损失包括:
    基于所述源对话序列、所述正样本对话序列和所述负样本对话序列,根据所述预训练的语言模型,分别确定源对话标记序列、正样本对话标记序列和负样本对话标记序列;以及
    基于所述源对话标记序列与所述正样本对话标记序列之间的相似程度以及所述源对话标记序列与所述负样本对话标记序列之间的相似程度,确定所述第二损失。
  15. 根据权利要求13所述的电子设备,其中所述响应确定模型还包括预训练的分类器;并且
    其中确定所述匹配得分包括:
    基于所述源语句序列和所述候选响应语句,根据所述预训练的语言模型,确定源对话分类标记;以及
    基于所述源对话分类标记,根据所述预训练的分类器,确定所述匹配得分。
  16. 根据权利要求15所述的电子设备,其中训练所述响应确定模型包括:
    通过最小化所述第一损失与所述第二损失的和,调整所述预训练的分类器和所述预训练的语言模型各自的参数,以获得经训练的所述响应确定模型。
  17. 一种用于确定响应语句的装置,包括
    匹配得分确定模块,被配置为基于源语句序列和候选响应语句,根据响应确定模型,确定所述候选响应语句与对应于所述源语句序列的正确响应语句之间的匹配得分;
    第一损失确定模块,被配置为基于所述匹配得分和所述候选响应语句的标注信息来确定第一损失,所述第一损失表示所述匹配得分与所述标注信息之间的相似程度,所述标注信息表示所述候选响应语句 是否是所述正确响应语句;
    第二损失确定模块,被配置为基于源对话序列、正样本对话序列和负样本对话序列来确定第二损失,所述源对话序列包括所述源语句序列和所述候选响应语句,所述正样本对话序列至少包括所述正确响应语句,所述负样本对话序列包括所述源语句序列和针对所述源语句序列的错误响应语句,所述第二损失表示相比所述负样本对话序列,所述正样本对话序列与所述源对话序列之间的相似程度;以及
    模型训练模块,被配置为基于所述第一损失和所述第二损失的值,来训练所述响应确定模型。
  18. 一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现根据权利要求1至8中任一项所述的方法。
PCT/CN2022/118787 2021-10-29 2022-09-14 用于确定响应语句的方法、设备、装置和介质 WO2023071581A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111275265.9 2021-10-29
CN202111275265.9A CN114020887B (zh) 2021-10-29 2021-10-29 用于确定响应语句的方法、设备、装置和介质

Publications (1)

Publication Number Publication Date
WO2023071581A1 true WO2023071581A1 (zh) 2023-05-04

Family

ID=80058897

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/118787 WO2023071581A1 (zh) 2021-10-29 2022-09-14 用于确定响应语句的方法、设备、装置和介质

Country Status (2)

Country Link
CN (1) CN114020887B (zh)
WO (1) WO2023071581A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114020887B (zh) * 2021-10-29 2023-11-07 北京有竹居网络技术有限公司 用于确定响应语句的方法、设备、装置和介质
CN114969306B (zh) * 2022-05-31 2024-04-05 平安科技(深圳)有限公司 医疗信息推荐模型的训练方法、装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737952A (zh) * 2020-06-24 2020-10-02 深圳前海微众银行股份有限公司 一种序列标注模型的训练方法及装置
CN112199479A (zh) * 2020-09-15 2021-01-08 北京捷通华声科技股份有限公司 优化语言语义理解模型方法、装置、设备及存储介质
US20210182902A1 (en) * 2019-12-11 2021-06-17 Google Llc Methods, systems, and media for automated compliance determination of content items
CN114020887A (zh) * 2021-10-29 2022-02-08 北京有竹居网络技术有限公司 用于确定响应语句的方法、设备、装置和介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110162596B (zh) * 2019-04-01 2023-07-14 腾讯科技(深圳)有限公司 自然语言处理的训练方法、装置、自动问答方法和装置
CN112347278A (zh) * 2019-10-25 2021-02-09 北京沃东天骏信息技术有限公司 用于训练表征模型的方法和装置
CN111930992B (zh) * 2020-08-14 2022-10-28 腾讯科技(深圳)有限公司 神经网络训练方法、装置及电子设备
CN112287069B (zh) * 2020-10-29 2023-07-25 平安科技(深圳)有限公司 基于语音语义的信息检索方法、装置及计算机设备
CN112860848B (zh) * 2021-01-20 2022-03-25 平安科技(深圳)有限公司 信息检索方法、装置、设备及介质
CN112966102A (zh) * 2021-02-10 2021-06-15 万翼科技有限公司 分类模型构建及文本语句分类方法、设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210182902A1 (en) * 2019-12-11 2021-06-17 Google Llc Methods, systems, and media for automated compliance determination of content items
CN111737952A (zh) * 2020-06-24 2020-10-02 深圳前海微众银行股份有限公司 一种序列标注模型的训练方法及装置
CN112199479A (zh) * 2020-09-15 2021-01-08 北京捷通华声科技股份有限公司 优化语言语义理解模型方法、装置、设备及存储介质
CN114020887A (zh) * 2021-10-29 2022-02-08 北京有竹居网络技术有限公司 用于确定响应语句的方法、设备、装置和介质

Also Published As

Publication number Publication date
CN114020887A (zh) 2022-02-08
CN114020887B (zh) 2023-11-07

Similar Documents

Publication Publication Date Title
US10726061B2 (en) Identifying text for labeling utilizing topic modeling-based text clustering
US11113479B2 (en) Utilizing a gated self-attention memory network model for predicting a candidate answer match to a query
JP6444530B2 (ja) 音声言語理解システム
WO2023071581A1 (zh) 用于确定响应语句的方法、设备、装置和介质
CN109887484B (zh) 一种基于对偶学习的语音识别与语音合成方法及装置
CN113811946A (zh) 数字序列的端到端自动语音识别
Rahul et al. Biomedical event trigger identification using bidirectional recurrent neural network based models
US11557286B2 (en) Speech recognition method and apparatus
JP6461308B2 (ja) 音声認識装置およびリスコアリング装置
Pramanik et al. Text normalization using memory augmented neural networks
JP2023544336A (ja) 多言語発話認識フレームワークのためのシステム及び方法
KR102024845B1 (ko) 화행 분석 장치 및 방법
US20230104228A1 (en) Joint Unsupervised and Supervised Training for Multilingual ASR
WO2014073206A1 (ja) 情報処理装置、及び、情報処理方法
US11790018B1 (en) Apparatus for attribute traversal
Anantaram et al. Repairing ASR output by Artificial Development and Ontology based Learning.
WO2023028066A1 (en) System and method for a natural language understanding system based on iterative intent detection and slot filling neural layers
EP4315319A1 (en) Supervised and unsupervised training with contrastive loss over sequences
WO2023035883A1 (zh) 用于文档和摘要的一致性检测的方法、设备和介质
WO2023061107A1 (zh) 基于层预测的语言翻译的方法、设备、装置和介质
Tanaka et al. Neural candidate-aware language models for speech recognition
Bai et al. A public Chinese dataset for language model adaptation
Reddy et al. Learning not to Discriminate: Task Agnostic Learning for Improving Monolingual and Code-switched Speech Recognition
Kolehmainen et al. Personalization for bert-based discriminative speech recognition rescoring
US20230116268A1 (en) System and a method for phonetic-based transliteration

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22885468

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE