WO2023185912A1 - 对话摘要生成方法和装置、模型训练方法和设备 - Google Patents

对话摘要生成方法和装置、模型训练方法和设备 Download PDF

Info

Publication number
WO2023185912A1
WO2023185912A1 PCT/CN2023/084642 CN2023084642W WO2023185912A1 WO 2023185912 A1 WO2023185912 A1 WO 2023185912A1 CN 2023084642 W CN2023084642 W CN 2023084642W WO 2023185912 A1 WO2023185912 A1 WO 2023185912A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
dialogue
content
summary generation
full
Prior art date
Application number
PCT/CN2023/084642
Other languages
English (en)
French (fr)
Inventor
邹炎炎
张海楠
陈宏申
丁卓冶
龙波
Original Assignee
北京沃东天骏信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2023185912A1 publication Critical patent/WO2023185912A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the field of dialogue summary generation, and in particular to a dialogue summary generation method and device, and a model training method and device.
  • a conversation summary generation method including:
  • the use of contrastive learning to model semantic coherence, determine dialogue topic information, and generate a dialogue summary generation model includes:
  • the use of contrastive learning to model semantic coherence, determine dialogue topic information, and generate a dialogue summary generation model also includes:
  • the dialogue sentence coherence detection model, sub-summary generation model and full-text summary generation are combined using alternating parameter updates. into the model for model training.
  • the method of using alternating parameter updates to train the dialogue sentence coherence detection model, sub-summary generation model and full-text summary generation model includes:
  • the objective function of the dialogue sentence coherence detection model, the objective function of the sub-summary generation model and the objective function of the full-text summary generation model are used to sequentially update the parameters to train the dialogue sentence coherence detection model, sub-summary generation model and full-text summary.
  • the dialogue sentence coherence detection model and the subsummary generation model are used as auxiliary tasks to improve the quality of the summary generated by the full-text summary generation model.
  • the use of contrastive learning to model semantic coherence, determine dialogue topic information, and generate a dialogue summary generation model also includes:
  • preprocessing the conversation content includes:
  • the word segmenter using the pre-trained model segments the spliced dialogue content into words, and retains the first predetermined number of words as input to the model.
  • building corresponding model training data according to the needs of the dialogue content understanding model, sub-summary generation model and full-text summary generation model includes:
  • the window constructed based on the continuous dialogue sentences in the dialogue content is used as the positive example model training data, and the data of the dialogue sentences in the window content that is scrambled and then re-spliced is used as the negative example model training data;
  • the entire conversation content is used as input to the model and the output is a complete summary.
  • building a coherence detection model modeling the switching relationship between different topics in the conversation content by learning semantic coherence, and obtaining topic segmentation information of the conversation content includes:
  • constructing a subsummary generation model to generate corresponding subsummaries for each topic of the conversation content includes:
  • a third predetermined number of predetermined positive and negative example pairs are randomly selected for training
  • the objective function of the subsummary generation model is determined based on the marginal loss function based on contrastive learning.
  • building a full-text summary generation model to generate a full-text dialogue summary for the dialogue content includes:
  • the training goal of the full-text summary generation model is set to learn optimal model parameters and minimize the negative log-likelihood function value
  • a model training method including:
  • the dialogue summary generation model is trained using alternating parameter updates, so that the dialogue summary generation model outputs summary information of the target dialogue content based on the input target dialogue content.
  • the use of contrastive learning to model semantic coherence, determine dialogue topic information, and generate a dialogue summary generation model includes:
  • the model training of the dialogue summary generation model using alternating parameter updates includes:
  • the objective function of the dialogue sentence coherence detection model is used to sequentially update parameters to train the dialogue summary generation model.
  • the dialogue summary generation model includes dialogue sentences.
  • the dialogue sentence coherence detection model and the subsummary generation model are used as auxiliary tasks to improve the quality of the summary generated by the full-text summary generation model.
  • the use of contrastive learning to model semantic coherence, determine dialogue topic information, and generate a dialogue summary generation model also includes:
  • preprocessing the conversation content includes:
  • the word segmenter using the pre-trained model segments the spliced dialogue content into words, and retains the first predetermined number of words as input to the model.
  • building corresponding model training data according to the needs of the dialogue content understanding model, sub-summary generation model and full-text summary generation model includes:
  • the window constructed based on the continuous dialogue sentences in the dialogue content is used as the positive example model training data, and the data of the dialogue sentences in the window content that is scrambled and then re-spliced is used as the negative example model training data;
  • the entire conversation content is used as input to the model and the output is a complete summary.
  • building a coherence detection model modeling the switching relationship between different topics in the conversation content by learning semantic coherence, and obtaining topic segmentation information of the conversation content includes:
  • a second predetermined number of predetermined positive and negative example pairs are randomly selected and the coherence loss is calculated based on contrastive learning
  • constructing a subsummary generation model to generate corresponding subsummaries for each topic of the conversation content includes:
  • a third predetermined number of predetermined positive and negative example pairs are randomly selected for training
  • the objective function of the subsummary generation model is determined based on the marginal loss function based on contrastive learning.
  • the full-text summary generation model is constructed to generate a full-text dialogue summary for the dialogue content. To include:
  • the training goal of the full-text summary generation model is set to learn optimal model parameters and minimize the negative log-likelihood function value
  • a conversation summary generating device including:
  • the model generation module is used to model semantic coherence using contrastive learning, determine dialogue topic information, and generate a dialogue summary generation model
  • the dialogue summary determination module is used to input the target dialogue content into the dialogue summary generation model to obtain summary information of the target dialogue content.
  • the conversation summary generating device is configured to perform operations that implement the conversation summary generating method described in any of the above embodiments.
  • a model training device including:
  • the model generation unit is used to model semantic coherence using contrastive learning, determine dialogue topic information, and generate a dialogue summary generation model
  • the model training unit is used to perform model training on the dialogue summary generation model using an alternating parameter update method, so that the dialogue summary generation model outputs summary information of the target dialogue content based on the input target dialogue content.
  • the dialogue model training device is used to perform operations that implement the model training method described in any of the above embodiments.
  • a computer device including:
  • Memory used to store instructions
  • a processor configured to execute the instructions, causing the computer device to perform operations that implement the conversation summary generation method as described in any of the above embodiments, and/or the model training method as described in any of the above embodiments.
  • a non-transitory computer-readable storage medium stores computer instructions, and when the instructions are executed by a processor, any one of the above is implemented.
  • the non-transitory computer-readable storage medium stores computer instructions, and when the instructions are executed by a processor, any one of the above is implemented.
  • the conversation summary generation method described in the model training method described in any of the above embodiments is provided.
  • Figure 1 is a schematic diagram of some embodiments of the disclosed conversation summary generation method.
  • Figure 2 is a schematic diagram of conversation content and summary in some embodiments of the present disclosure.
  • Figure 3 is a schematic diagram of some embodiments of the model training method of the present disclosure.
  • Figure 4 is a schematic diagram of other embodiments of the model training method of the present disclosure.
  • Figure 5 is a schematic diagram of some embodiments of a conversation summary generating device of the present disclosure.
  • Figure 6 is a schematic diagram of some embodiments of the model training device of the present disclosure.
  • Figure 7 is a schematic diagram of other embodiments of the model training device of the present disclosure.
  • Figure 8 is a schematic structural diagram of some embodiments of a computer device of the present disclosure.
  • any specific values are to be construed as illustrative only and not as limiting. Accordingly, other examples of the exemplary embodiments may have different values.
  • the present disclosure provides a dialogue summary generation method and device, a model training method and equipment, which can implicitly learn the topic information of the dialogue content and generate summaries for different topics.
  • the present disclosure will be described below through specific embodiments.
  • Figure 1 is a schematic diagram of some embodiments of the disclosed conversation summary generation method.
  • this embodiment can be executed by the disclosed conversation summary generating device or the disclosed computer device.
  • the method may include at least one of the following steps, wherein:
  • Step 100 Use comparative learning to model semantic coherence, determine dialogue topic information, and generate a dialogue summary generation model.
  • Step 200 Input the target dialogue content into the dialogue summary generation model to obtain summary information of the target dialogue content.
  • FIG 2 is a schematic diagram of conversation content and summary in some embodiments of the present disclosure.
  • the conversation content has three topics, and the corresponding topic segments are marked with S1, S2 and S3 respectively.
  • S1 is the “current situation” (lines 1-7 in Figure 2)
  • S2 is the “arrival time” (lines 8-10 in Figure 2)
  • S3 is “food to eat” (lines 11-18 in Figure 2).
  • Lines 19-21 in Figure 2 are summaries of each topic.
  • the central idea of each topic is summarized in one sentence, that is, t1 represents the summary of S1, t2 represents the summary of S2, and t3 represents the summary of S3.
  • sentences from the same topic are more coherent than sentences from different topics (for example, inter-topic segments S4 and S5).
  • This point reveals a potential relationship between topic and sentence coherence.
  • paragraphs or sections can be treated as natural topic segments, it is difficult to accurately segment conversation topics.
  • the above embodiments of the present disclosure propose to implicitly capture dialogue topic information by modeling discourse coherence in a contrastive learning manner.
  • the above-described embodiments of the present disclosure build a coherence detection module to push the model to pay more attention to segments that are more coherent and may contain significant information from the same topic.
  • the present disclosure can implicitly learn topic information of conversation content and generate summaries for different topics without additional annotations.
  • Both modules of the above-mentioned embodiments of the present disclosure are constructed in a comparative learning manner and do not require additional manual annotations or additional algorithms. These two modules of the above embodiments of the present disclosure can be combined with the full-text dialogue summarization task through an alternating parameter update strategy, thereby forming the final model of the present disclosure.
  • step 100 of the embodiment of Figure 1 may include: a model training method. That is, the dialogue summary generation model is trained using alternating parameter updates.
  • Figure 3 is a schematic diagram of some embodiments of the model training method of the present disclosure.
  • this embodiment can be executed by the disclosed model training device, the disclosed conversation summary generating device, or the disclosed computer device.
  • the disclosed model training method or the disclosed dialogue summary generating method (such as step 100 in the embodiment of Figure 1) may include at least one of steps 101 to 102, wherein:
  • Step 101 Use comparative learning to model semantic coherence, determine dialogue topic information, and generate a dialogue summary generation model.
  • Step 102 Perform model training on the dialogue summary generation model by using an alternating parameter update method, so that the dialogue summary generation model outputs summary information of the target dialogue content based on the input target dialogue content.
  • Figure 4 is a schematic diagram of other embodiments of the model training method of the present disclosure.
  • this embodiment can be executed by the disclosed model training device, the disclosed conversation summary generating device, or the disclosed computer device.
  • the disclosed model training method or the disclosed dialogue summary generating method (such as step 100 in the embodiment of Figure 1) may include at least one step from step 110 to step 170, wherein:
  • Step 110 Preprocess the conversation content.
  • the training data for the conversation summary generation task includes two parts, namely the conversation content and its corresponding summary.
  • Step 110 mainly performs preprocessing on the dialogue content.
  • step 110 may include at least one of steps 111-112, wherein:
  • Step 111 Add speaker information to the speech content of different speakers in the conversation content, and then splice them together.
  • Step 112 Use the word segmenter of the pre-trained model to segment the spliced dialogue content into words, and retain a first predetermined number of words as input to the model.
  • the word segmenter of the pre-trained model may be BART (Bidirectional and Auto-Regressive Transformers).
  • the pre-training model may be BART, STEP, PEGASUS (Pre-training with Extracted Gap-sentences for Abstract Summarization, extracting gap sentences for abstract summarization through pre-training) and other models.
  • the first predetermined number of words may be the first 1024 words.
  • Step 120 Construct corresponding different forms of model training data according to the needs of the dialogue content understanding model, sub-summary generation model and full-text summary generation model.
  • step 120 may include at least one of steps 121 to 123, wherein:
  • Step 121 for the dialogue content understanding model, the window constructed based on the continuous dialogue sentences in the dialogue content is used as the positive example model training data, and the data of the dialogue sentences in the window content that is scrambled and then re-spliced is used as the negative example model training data.
  • step 121 for the dialogue content understanding model, construct training data according to the dialogue content, without involving the summary part.
  • step 121 may include: taking a window w constructed from k consecutive dialogue sentences as a positive example, and shuffling the order of the sentences in the window w and then re-splicing them as a negative example.
  • the window containing three consecutive dialogue sentences "Hania:I will be there around 7pm I guess:(Julia:I will be waiting!:*Hania:Great!” as the positive
  • shuffling the order of the statements in the window and then splicing them together we get "Hania:Great! Hania:I will be there around 7pm I guess:(Julia:I will be waiting!:*”.
  • the newly obtained statement window as a negative example.
  • Step 122 For the sub-summary generation model, corresponding positive model training data and negative model training data are generated for each topic of the dialogue content.
  • step 122 may include constructing training data for the topic subsummary generation model by combining the joint conversation content and the conversation summary.
  • a summary of a long conversation always consists of multiple sentences, each of which is considered a subsummary.
  • this disclosure assumes that each subsummary is related to one topic.
  • sentence terminators such as periods, question marks, exclamation points, etc.
  • the present disclosure can retrieval the most relevant segments Sipos from dialogue D based on ROUGE-2 recall scores.
  • the present disclosure has constructed a training data ⁇ (Sipos,ti),(Sineg,ti) ⁇ .
  • the dialogue content D ⁇ Hania:I will be there around 7pm I guess:(Julia:I will be waiting!:*Hania:Great. ⁇ ,
  • the summary (as model input) constitutes a positive example training data.
  • the dialogue in Figure 2 As an example, take the dialogue content "Julia:Where are you...Julia:I know how you feel love,I am sick of trains already:(" as the model input, and the subsummary "She will get there around 7pm.” as the model output. , this combination of input and output constitutes a negative example training data.
  • step 122 may include: selecting the Rouge-2 recall score to match the subsummary and the conversation content window when the training data construction module constructs the training data for the subsummary generation module.
  • using the automatic summarization evaluation method Rouge-2 recall score can be replaced by using the BertScore method to recall the score.
  • Step 123 For the full-text summary generation model, the entire conversation content is used as model input and a complete summary is output.
  • Step 130 Construct a dialogue content understanding model, semantically encode the dialogue content, and train the dialogue content understanding model.
  • step 130 may include: using the network structure of BART's Transformer encoder to encode the conversation content into a semantic vector. For example, given the content of a dialogue (obtained by the training data construction module) D, step 130 will understand the dialogue content and output the semantic representation vector E of the dialogue content.
  • Step 140 Construct a coherence detection model, model the switching relationship between different topics in the dialogue content by learning semantic coherence, and obtain topic segmentation information of the dialogue content.
  • the coherence detection model is used to detect the semantic coherence of the dialogue within the input dialogue content window.
  • step 140 may include at least one of steps 141 to 143, wherein:
  • Step 141 Calculate the coherence scores of the positive model training data and the negative model training data of the coherence detection model respectively.
  • step 141 may include: for the coherence detection model, steps 120 and 130 may include: constructing the positive example Spos and the negative example Sneg based on the original conversation content; inputting the positive and negative example data
  • the dialogue content understanding module obtains their respective semantic vector representations, which are recorded as and As shown in formula (1); then, the respective coherence scores co can be calculated through formula (2), and the normalized coherence scores of positive and negative examples can be calculated.
  • w 1 and b 1 are both model parameters.
  • the above embodiments of the present disclosure can calculate the correlation scores of positive and negative examples through formula (2).
  • Formula (2) uses the normalization property of softmax to calculate the normalized coherence scores of positive and negative examples.
  • Step 142 Randomly select a second predetermined number of predetermined positive and negative example pairs, and calculate the coherence loss based on contrastive learning.
  • the second predetermined number may be N co .
  • step 142 may include: for a dialogue content D, for simplicity, N co positive and negative example pairs are randomly selected during model training. Then calculate the coherence loss value based on contrastive learning As shown in formula (3).
  • ⁇ co is a marginal coefficient, and the present disclosure hopes that the coherence score of the positive segment is greater than the score of the negative segment.
  • k, N co and ⁇ co are all hyperparameters.
  • Step 143 Calculate the objective function of the coherence detection model based on edge contrast loss.
  • Step 150 Construct a subsummary generation model to generate corresponding subsummaries for each topic of the conversation content.
  • step 150 may include at least one of steps 151 to 154, wherein:
  • Step 151 Model the subsummary generation task as a sequence-to-sequence learning problem.
  • step 151 may include: determining negative log-likelihood values in the case of positive and negative example inputs according to formula (5) and formula (6).
  • Step 152 Determine the degree of irrelevance between the conversation summary fragment and the subsummary in the subsummary generation task.
  • step 152 may include: determining the degree of irrelevance of the conversation summary segment and the subsummary according to formula (7), where the normalized score after the softmax layer can be regarded as the irrelevant score.
  • This disclosure adopts A unified score indicates the degree of irrelevance of a dialogue summary segment and subsummaries.
  • Step 153 During the training phase, a third predetermined number of predetermined positive and negative example pairs are randomly selected for training.
  • step 153 may include: given a conversation D and its corresponding digest TD , Among them, T D contains m subsummaries, and at least m pairs of positive and negative examples can be constructed as training data. In order to simplify the calculation process, in the training phase, we randomly select N su (N su ⁇ m).
  • Step 154 Determine the objective function of the subsummary generation model based on the marginal loss function based on contrastive learning.
  • step 154 may include: for the dialogue content D, a marginal loss function based on contrastive learning may be constructed, as shown in formula (8).
  • ⁇ su is a marginal coefficient. Through this coefficient, the present disclosure hopes that the correlation score between a positive segment and a sub-summary is at least greater than the correlation score between the sub-summary and a negative segment.
  • N su and ⁇ su are hyperparameters.
  • Step 160 Construct a full-text summary generation model to generate a full-text dialogue summary for the dialogue content.
  • a full-text summary generation model is used to use conversation content and corresponding complete summaries as training data.
  • step 160 may include at least one of steps 161-162, wherein:
  • Step 161 Model the full-text summary generation task as a sequence-to-sequence learning problem.
  • Step 162 Set the training goal of the full-text summary generation model to learn optimal model parameters and minimize the negative log-likelihood function value.
  • dialogue sentences, plus their corresponding Summary T D (y1, y2,...,y
  • y 1:i-1 represents the i-1th mark of the output sequence (i.e., y i
  • y 1:i-1 (y1,y2,...,yi-1 )).
  • Step 163 Determine the objective function of the full-text summary generation model.
  • Step 170 The dialogue sentence coherence detection model, the sub-summary generation model and the full-text summary generation model are combined to perform model training using alternating parameter updates.
  • step 170 may include at least one of steps 171-172, wherein:
  • Step 171 during the training process, use the objective function of the dialogue sentence coherence detection model, the objective function of the subsummary generation model, and the objective function of the full-text summary generation model to sequentially update parameters to train the dialogue sentence coherence detection model and the subsummary generation model. and full-text summary generation models.
  • Step 172 During the training process, two objectives based on contrastive learning are used as auxiliary tasks to improve the quality of the summary generated by the full-text summary generation model. That is, the dialogue sentence coherence detection model and the subsummarization generation model serve as auxiliary tasks that can contribute to the main dialogue summarization task in the training phase.
  • the dialogue sentence coherence detection module and the sub-summary generation module serve as auxiliary tasks to help the full-text summary generation module improve the quality of generated summaries.
  • these two models and their corresponding data preprocessing modules are no longer needed.
  • the present disclosure only requires a conversation content understanding module and a full-text summary generation module to understand the conversation content and produce a summary.
  • the backbone network structure of the present disclosure is a transformer network, which is mainly used to understand input text and generate corresponding summaries.
  • the transformer network structure can also be replaced by an end-to-end generation model based on RNN (Recurrent Neural Network) and CNN (Convolutional Neural Networks, Convolutional Neural Network).
  • RNN Recurrent Neural Network
  • CNN Convolutional Neural Networks, Convolutional Neural Network
  • the dialogue summary generation method and model training method provided by the above embodiments of the present disclosure are based on the idea of contrastive learning and propose to model the switching relationship between different topics in the dialogue content by learning semantic coherence, thereby implicitly obtaining the dialogue content.
  • Topic segmentation information, and summary generation of conversation content on the same topic do not require additional annotation information or a priori algorithms.
  • the end-to-end summary generation method of multi-task learning can solve the technical problem of the error transmission shortcoming of the pipeline method (that is, the conversation topic information is first obtained, and then the summary is generated based on the topic information).
  • the above embodiments of the present disclosure propose a contrastive learning method based on the coherence of dialogue content to implicitly model the topic information of the dialogue content.
  • the above embodiments of the present disclosure introduce the concept of subsummary and design a subsummary generation method based on comparative learning, which can effectively help the model generate subsummaries for the content of the same topic after obtaining the topic information of the dialogue content.
  • the above-mentioned embodiments of the present disclosure adopt a construction method of positive and negative examples for the training data of the dialogue content coherence detection module and the sub-summary generation module.
  • the above embodiments of the present disclosure combine a multi-task learning method with two comparative objective functions (i.e., the objective function of the dialogue sentence coherence detection module and the sub-summary generation module) and the objective function of the full-text summary generation module, which can support the end-to-end dialogue of the model. Summary generation avoids incorrect delivery.
  • Figure 5 is a schematic diagram of some embodiments of a conversation summary generating device of the present disclosure.
  • the dialogue summary generation device of the present disclosure may include a model generation module 51 and a dialogue summary determination module 52, where:
  • the model generation module 51 is used to model semantic coherence using contrastive learning, determine dialogue topic information, and generate a dialogue summary generation model.
  • the model generation module 41 may be implemented as the model training device of the present disclosure.
  • the dialogue summary determination module 52 is used to input the target dialogue content into the dialogue summary generation model to obtain summary information of the target dialogue content.
  • the conversation summary generating device is configured to perform operations that implement the conversation summary generating method described in any of the above embodiments.
  • Figure 6 is a schematic diagram of some embodiments of the model training device of the present disclosure.
  • the model training device of the present disclosure (such as the model generation module 51 of the embodiment of Figure 5) may include a model generation unit 61 and a model training unit 62, where:
  • the model generation unit 61 is used to model semantic coherence using contrastive learning, determine dialogue topic information, and generate a dialogue summary generation model.
  • the model training unit 62 is configured to perform model training on the dialogue summary generation model in an alternating parameter update manner, so that the dialogue summary generation model outputs summary information of the target dialogue content according to the input target dialogue content.
  • Figure 7 is a schematic diagram of other embodiments of the model training device of the present disclosure.
  • the model training device of the present disclosure may include a dialogue content preprocessing module 71, a training data construction module 72, a dialogue content understanding module 73, and a dialogue sentence coherence detection module.
  • the conversation content preprocessing module 71 is used to preprocess the conversation content.
  • the dialogue content preprocessing module 71 can be used to add speaker information to the speech content of different speakers in the dialogue content, and then splice them together; the word segmenter of the pre-trained model is used to splice the spliced dialogue.
  • the content is segmented into words, and a first predetermined number of words are retained as input to the model.
  • the training data construction module 72 is used to construct corresponding model training data according to the requirements of the dialogue content understanding model, sub-summary generation model and full-text summary generation model.
  • the training data construction module 72 can be used for the dialogue content understanding model, and the window constructed according to the continuous dialogue sentences in the dialogue content is used as the positive example model training data, and the dialogue sentences of the window content will be sequentially constructed.
  • the data that is re-spliced after chaos is used as negative example model training data; for the sub-summary generation model, corresponding positive example model training data and negative example model training data are generated for each topic of the dialogue content; for the full-text summary generation model, the corresponding positive example model training data and negative example model training data are generated.
  • the entire conversation content is used as input to the model, and the output is a complete summary.
  • the dialogue content understanding module 73 is used to build a dialogue content understanding model, semantically encode the dialogue content, and train the dialogue content understanding model.
  • the dialogue sentence coherence detection module 74 is used to build a coherence detection model, model the switching relationship between different topics in the dialogue content by learning semantic coherence, and obtain topic segmentation information of the dialogue content.
  • the dialogue sentence coherence detection module 74 can be used to calculate the coherence scores of the positive model training data and the negative model training data of the coherence detection model respectively; randomly select the second predetermined A number of predetermined positive and negative example pairs are used to calculate the coherence loss based on contrast learning; the objective function of the coherence detection model is calculated based on edge contrast loss.
  • the topic subsummary generation module 75 is used to build a subsummary generation model and generate corresponding subsummaries for each topic of the dialogue content.
  • the topic subsummary generation module 75 can be used to model the subsummary generation task as a sequence-to-sequence learning problem; determine the degree of irrelevance of the conversation summary fragment and the subsummary in the subsummary generation task. ; In the training phase, a third predetermined number of predetermined positive and negative example pairs are randomly selected for training; the objective function of the sub-summary generation model is determined according to the marginal loss function based on contrastive learning.
  • the full-text dialogue summary generation module 76 is used to build a full-text summary generation model and generate a full-text dialogue summary for the dialogue content.
  • the full-text dialogue summary generation module 76 can be used to model the full-text summary generation task as a sequence-to-sequence learning problem; set the training goal of the full-text summary generation model as learning the optimal model parameters and minimize the negative log-likelihood function value; determine the objective function of the full-text summary generation model.
  • the multi-task learning module 77 is used to update the dialogue sentence coherence detection model and sub-excerpts using alternating parameter updates. To generate models and full-text summary generation models for model training.
  • the multi-task learning module 77 may be used to sequentially update the objective function of the dialogue sentence coherence detection model, the objective function of the sub-summary generation model, and the objective function of the full-text summary generation model during the training process. parameters to train the dialogue sentence coherence detection model, sub-summary generation model and full-text summary generation model; during the training process, the dialogue sentence coherence detection model and sub-summary generation model are used as auxiliary tasks to improve the quality of the summary generated by the full-text summary generation model. .
  • the dialogue model training device is used to perform operations that implement the model training method described in any of the above embodiments.
  • the dialogue summary generation device and model training equipment provided by the above embodiments of the present disclosure, based on the idea of contrastive learning, propose to model the switching relationship between different topics in the dialogue content by learning semantic coherence, thereby implicitly obtaining the dialogue content Topic segmentation information, and summary generation of conversation content on the same topic.
  • the conversation summary generating device in the above embodiments of the present disclosure does not require additional annotation information or a priori algorithms.
  • the end-to-end summary generation device for multi-task learning provided by the above embodiments of the present disclosure can solve the technical problem of the error transmission shortcoming of the pipeline method (that is, the conversation topic information is first obtained, and then the summary is generated based on the topic information).
  • Figure 8 is a schematic structural diagram of some embodiments of a computer device of the present disclosure. As shown in FIG. 8 , the computer device includes a memory 81 and a processor 82 .
  • the memory 81 is used to store instructions, and the processor 82 is coupled to the memory 81.
  • the processor 82 is configured to execute the conversation summary generation method described in the above embodiments based on the instructions stored in the memory, and/or as described in any of the above embodiments. model training method.
  • the computer device also includes a communication interface 83 for information interaction with other devices.
  • the computer device also includes a bus 84, through which the processor 82, the communication interface 83, and the memory 81 complete communication with each other.
  • the memory 81 may include high-speed RAM memory, or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the memory 81 may also be a memory array.
  • the storage 81 may also be divided into blocks, and the blocks may be combined into virtual volumes according to certain rules.
  • processor 82 may be a central processing unit (CPU), or may be an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present disclosure.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • this disclosure propose a contrastive learning method based on the coherence of the dialogue content to implicitly model the topic information of the dialogue content.
  • the above embodiments of the present disclosure introduce the concept of subsummary and design a subsummary generation method based on comparative learning, which can effectively help the model generate subsummaries for the content of the same topic after obtaining the topic information of the dialogue content.
  • the above-mentioned embodiments of the present disclosure adopt a construction method of positive and negative examples for the training data of the dialogue content coherence detection module and the sub-summary generation module.
  • the above embodiments of the present disclosure combine a multi-task learning method with two comparative objective functions (i.e., the objective function of the dialogue sentence coherence detection module and the sub-summary generation module) and the objective function of the full-text summary generation module, which can support the end-to-end dialogue of the model. Summary generation avoids incorrect delivery.
  • a non-transitory computer-readable storage medium stores computer instructions, and when the instructions are executed by a processor, any one of the above is implemented.
  • embodiments of the present disclosure may be provided as methods, apparatuses, or computer program products. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk memory, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. .
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby causing the computer or other programmable data processing device to
  • the instructions executing on the programmable device provide steps for implementing the functions specified in the process or processes of the flowchart diagrams and/or the block or blocks of the block diagrams.
  • the conversation summary generation device described above can be implemented as a general-purpose processor, a programmable logic controller (PLC), a digital signal processor (DSP), an application-specific integrated circuit (ASIC), or a field processor for performing the functions described in this application.
  • PLC programmable logic controller
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA Programmable gate array
  • the program can be stored in a non-transitory computer-readable storage medium.
  • the storage medium mentioned above can be a read-only memory, a magnetic disk or an optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开涉及一种对话摘要生成方法和装置、模型训练方法和设备。该对话摘要生成方法包括:采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型;将目标对话内容输入对话摘要生成模型,得到目标对话内容的摘要信息。本公开能够隐式地学习对话内容的主题信息,并针对不同主题进行摘要的生成,而无需额外的标注。

Description

对话摘要生成方法和装置、模型训练方法和设备
相关申请的交叉引用
本申请是以CN申请号为202210338218.2,申请日为2022年4月1日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及对话摘要生成领域,特别涉及一种对话摘要生成方法和装置、模型训练方法和设备。
背景技术
在线沟通已经成为我们日常工作和生活中不可或缺的一种交流方式。在信息***的时代,最重要的是呈现对话内容中最显著的事实,而不是冗长的话语,这对于在线客服和会议总结很有用。给定一段对话内容,生成式对话摘要旨在将对话的内容进行总结重述,只呈现出对话中的重要内容。
发明内容
根据本公开的一个方面,提供一种对话摘要生成方法,包括:
采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型;
将目标对话内容输入对话摘要生成模型,得到目标对话内容的摘要信息。
在本公开的一些实施例中,所述采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型包括:
构建连贯性检测模型,通过学习语义连贯性建模对话内容中不同主题之间的切换关系,得到对话内容的主题分割信息;
构建子摘要生成模型,为对话内容的每个主题生成对应的子摘要;
构建全文摘要生成模型,为对话内容生成全文对话摘要。
在本公开的一些实施例中,所述采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型还包括:
采用交替参数更新的方式将对话语句连贯性检测模型、子摘要生成模型和全文摘要生 成模型进行模型训练。
在本公开的一些实施例中,所述采用交替参数更新的方式将对话语句连贯性检测模型、子摘要生成模型和全文摘要生成模型进行模型训练包括:
在训练过程中,采用对话语句连贯性检测模型的目标函数、子摘要生成模型的目标函数和全文摘要生成模型的目标函数依次更新参数,训练对话语句连贯性检测模型、子摘要生成模型和全文摘要生成模型;
在训练过程中,对话语句连贯性检测模型和子摘要生成模型作为辅助任务,用于提升全文摘要生成模型生成摘要的质量。
在本公开的一些实施例中,所述采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型还包括:
对对话内容进行预处理;
根据对话内容理解模型、子摘要生成模型和全文摘要生成模型的需求,构建相应的模型训练数据;
构建对话内容理解模型,对对话内容进行语义编码,并对对话内容理解模型进行训练。
在本公开的一些实施例中,所述对对话内容进行预处理包括:
将对话内容中不同说话者的说话内容添加说话人信息后,拼接在一起;
采用预训练模型的分词器将拼接后的对话内容进行分词,并保留第一预定数量的词作为模型的输入。
在本公开的一些实施例中,所述根据对话内容理解模型、子摘要生成模型和全文摘要生成模型的需求,构建相应的模型训练数据包括:
对于对话内容理解模型,根据对话内容中连续的对话语句构建的窗口作为正例模型训练数据,窗口内容的对话语句将顺序打乱后再重新拼接的数据作为负例模型训练数据;
对于子摘要生成模型,针对对话内容的每个主题,生成对应的正例模型训练数据和负例模型训练数据;
对于全文摘要生成模型,将全部对话内容作为模型输入,输出为完整的摘要。
在本公开的一些实施例中,所述构建连贯性检测模型,通过学习语义连贯性建模对话内容中不同主题之间的切换关系,得到对话内容的主题分割信息包括:
分别计算连贯性检测模型的正例模型训练数据和负例模型训练数据的连贯性得分;
随机选择选择第二预定数量的预定正负例对,基于对比学习计算相干损失;
基于边缘对比损失计算连贯性检测模型的目标函数。
在本公开的一些实施例中,所述构建子摘要生成模型,为对话内容的每个主题生成对应的子摘要包括:
将子摘要生成任务建模为序列到序列的学习问题;
确定子摘要生成任务中对话摘要片段和子摘要的不相关程度;
在训练阶段,随机选择选第三预定数量的预定正负例对进行训练;
根据基于对比学习的边际损失函数确定子摘要生成模型的目标函数。
在本公开的一些实施例中,所述构建全文摘要生成模型,为对话内容生成全文对话摘要包括:
将全文摘要生成任务建模为一个序列到序列的学习问题;
将全文摘要生成模型的训练目标设定为学习最优模型参数并最小化负对数似然函数值;
确定全文摘要生成模型的目标函数。
根据本公开的另一方面,提供一种模型训练方法,包括:
采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型;
采用交替参数更新的方式对对话摘要生成模型进行模型训练,使得对话摘要生成模型根据输入的目标对话内容,输出目标对话内容的摘要信息。
在本公开的一些实施例中,所述采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型包括:
构建连贯性检测模型,通过学习语义连贯性建模对话内容中不同主题之间的切换关系,得到对话内容的主题分割信息;
构建子摘要生成模型,为对话内容的每个主题生成对应的子摘要;
构建全文摘要生成模型,为对话内容生成全文对话摘要。
在本公开的一些实施例中,所述采用交替参数更新的方式对对话摘要生成模型进行模型训练包括:
在训练过程中,采用对话语句连贯性检测模型的目标函数、子摘要生成模型的目标函数和全文摘要生成模型的目标函数依次更新参数,训练对话摘要生成模型,其中,对话摘要生成模型包括对话语句连贯性检测模型、子摘要生成模型和全文摘要生成模型;
在训练过程中,对话语句连贯性检测模型和子摘要生成模型作为辅助任务,用于提升全文摘要生成模型生成摘要的质量。
在本公开的一些实施例中,所述采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型还包括:
对对话内容进行预处理;
根据对话内容理解模型、子摘要生成模型和全文摘要生成模型的需求,构建相应的模型训练数据;
构建对话内容理解模型,对对话内容进行语义编码,并对对话内容理解模型进行训练。
在本公开的一些实施例中,所述对对话内容进行预处理包括:
将对话内容中不同说话者的说话内容添加说话人信息后,拼接在一起;
采用预训练模型的分词器将拼接后的对话内容进行分词,并保留第一预定数量的词作为模型的输入。
在本公开的一些实施例中,所述根据对话内容理解模型、子摘要生成模型和全文摘要生成模型的需求,构建相应的模型训练数据包括:
对于对话内容理解模型,根据对话内容中连续的对话语句构建的窗口作为正例模型训练数据,窗口内容的对话语句将顺序打乱后再重新拼接的数据作为负例模型训练数据;
对于子摘要生成模型,针对对话内容的每个主题,生成对应的正例模型训练数据和负例模型训练数据;
对于全文摘要生成模型,将全部对话内容作为模型输入,输出为完整的摘要。
在本公开的一些实施例中,所述构建连贯性检测模型,通过学习语义连贯性建模对话内容中不同主题之间的切换关系,得到对话内容的主题分割信息包括:
分别计算连贯性检测模型的正例模型训练数据和负例模型训练数据的连贯性得分;
在训练阶段,随机选择选择第二预定数量的预定正负例对,基于对比学习计算相干损失;
基于边缘对比损失计算连贯性检测模型的目标函数。
在本公开的一些实施例中,所述构建子摘要生成模型,为对话内容的每个主题生成对应的子摘要包括:
将子摘要生成任务建模为序列到序列的学习问题;
确定子摘要生成任务中对话摘要片段和子摘要的不相关程度;
在训练阶段,随机选择选第三预定数量的预定正负例对进行训练;
根据基于对比学习的边际损失函数确定子摘要生成模型的目标函数。
在本公开的一些实施例中,所述构建全文摘要生成模型,为对话内容生成全文对话摘 要包括:
将全文摘要生成任务建模为一个序列到序列的学习问题;
将全文摘要生成模型的训练目标设定为学习最优模型参数并最小化负对数似然函数值;
确定全文摘要生成模型的目标函数。
根据本公开的另一方面,提供一种对话摘要生成装置,包括:
模型生成模块,用于采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型;
对话摘要确定模块,用于将目标对话内容输入对话摘要生成模型,得到目标对话内容的摘要信息。
在本公开的一些实施例中,所述对话摘要生成装置用于执行实现如上述任一实施例所述的对话摘要生成方法的操作。
根据本公开的另一方面,提供一种模型训练设备,包括:
模型生成单元,用于采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型;
模型训练单元,用于采用交替参数更新的方式对对话摘要生成模型进行模型训练,使得对话摘要生成模型根据输入的目标对话内容,输出目标对话内容的摘要信息。
在本公开的一些实施例中,所述对话模型训练设备用于执行实现如上述任一实施例所述的模型训练方法的操作。
根据本公开的另一方面,提供一种计算机装置,包括:
存储器,用于存储指令;
处理器,用于执行所述指令,使得所述计算机装置执行实现如上述任一实施例所述的对话摘要生成方法、和/或如上述任一实施例所述的模型训练方法的操作。
根据本公开的另一方面,提供一种非瞬时性计算机可读存储介质,其中,所述非瞬时性计算机可读存储介质存储有计算机指令,所述指令被处理器执行时实现如上述任一实施例、和/或如上述任一实施例所述的模型训练方法所述的对话摘要生成方法。
附图说明
为了更清楚地说明本公开实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开 的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开对话摘要生成方法一些实施例的示意图。
图2为本公开一些实施例中对话内容和摘要的示意图。
图3为本公开模型训练方法一些实施例的示意图。
图4为本公开模型训练方法另一些实施例的示意图。
图5为本公开对话摘要生成装置一些实施例的示意图。
图6为本公开模型训练设备一些实施例的示意图。
图7为本公开模型训练设备另一些实施例的示意图。
图8为本公开计算机装置一些实施例的结构示意图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。
在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
发明人通过研究发现:相关技术尝试利用对话的内在信息来解决对话摘要的挑战,例如对话的主题特征(即话题)、对话行为和对话阶段。尽管此类模型已经证明了对话相关的信息在生成对话摘要方面的有效性,但需要仍然存在以下问题:该方法往往需要额外的 人工标注或先验算法来获得对话的话题特征,对话行为等信息,这往往会耗费很大的人力和机器资源。
发明人通过研究还发现:相关技术上游先验算法的结果甚至专家标注有可能是错误的,这种先获取对话相关信息,再进行对话摘要生成的流水线式方法,会导致上游错误的传递,影响到下游对话摘要模型的效果。
鉴于以上技术问题中的至少一项,本公开提供了一种对话摘要生成方法和装置、模型训练方法和设备,能够隐式地学习对话内容的主题信息,并针对不同主题进行摘要的生成。下面通过具体实施例对本公开进行说明。
图1为本公开对话摘要生成方法一些实施例的示意图。优选的,本实施例可由本公开对话摘要生成装置或本公开计算机装置执行。该方法可以包括以下步骤中的至少一个步骤,其中:
步骤100,采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型。
步骤200,将目标对话内容输入对话摘要生成模型,得到目标对话内容的摘要信息。
图2为本公开一些实施例中对话内容和摘要的示意图。如图2所示,该对话内容共有三个主题,对应的话题片段分别用,S1、S2和S3标记。S1为“当前情况”(图2第1-7行),S2为“到达时间”(图2第8-10行),S3为“吃的食物”(图2第11-18行)。图2第19-21行为每个主题的摘要,每个主题的中心思想用一个句子概括,即t1代表S1的摘要,t2代表S2摘要,t3代表S3摘要。
如图2所示,来自在同一主题中的句子(比如S1,S2,S3内的对话内容)比来自不同主题的句子(例如,主题间片段S4和S5)连贯性更好。这一点揭示了主题和语句连贯性之间的潜在关系。与结构化文本可以将段落或部分视为自然主题段不同,很难准确地对对话主题进行切分。本公开上述实施例基于主题和话语连贯性之间的内在关系,提出通过以对比学习的方式对话语连贯性进行建模来隐式捕获对话主题信息。本公开上述实施例构建了连贯性检测模块来推动模型更多地关注更加连贯且可能包含来自相同主题的显著信息的片段。
此外,由于本公开上述实施例的目标是为对话中的每个主题生成更好的摘要,因此我们还引入了子摘要生成模块,可以模型识别最显著的信息并生成相应的摘要。
本公开能够隐式地学习对话内容的主题信息,并针对不同主题进行摘要的生成,而无需额外的标注。
本公开上述实施例的这两个模块都是以对比学习的方式构建的,不需要额外的人工注释或额外的算法。本公开上述实施例的这两个模块可以通过交替参数更新策略与全文对话摘要任务相结合,从而形成本公开的最终模型。
在本公开的一些实施例中,图1实施例的步骤100可以包括:模型训练方法。即,采用交替参数更新的方式对对话摘要生成模型进行模型训练.
图3为本公开模型训练方法一些实施例的示意图。优选的,本实施例可由本公开模型训练设备、本公开对话摘要生成装置或本公开计算机装置执行。如图3所示,本公开模型训练方法或本公开对话摘要生成方法(例如图1实施例的步骤100)可以包括步骤101-步骤102中的至少一个步骤,其中:
步骤101,采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型。
步骤102,采用交替参数更新的方式对对话摘要生成模型进行模型训练,使得对话摘要生成模型根据输入的目标对话内容,输出目标对话内容的摘要信息。
图4为本公开模型训练方法另一些实施例的示意图。优选的,本实施例可由本公开模型训练设备、本公开对话摘要生成装置或本公开计算机装置执行。如图4所示,本公开模型训练方法或本公开对话摘要生成方法(例如图1实施例的步骤100)可以包括步骤110-步骤170中的至少一个步骤,其中:
步骤110,对对话内容进行预处理。
在本公开的一些实施例中,如图2所示,对话摘要生成任务的训练数据包含两部分,即对话内容和其对应的摘要。步骤110主要针对对话内容进行预处理。
在本公开的一些实施例中,步骤110可以包括步骤111-步骤112中的至少一个步骤,其中:
步骤111,将对话内容中不同说话者的说话内容添加说话人信息后,拼接在一起。
以图2对话的前两句为例进行说明说明,将“Where are you?”这句话的说话者“Julia”添加在这句话前面,表示此句是由Julia所说,得到“Julia:Where are you?”。将“That’s a good question,haha”前面加入说话者“Hania”得到“Hania:That’s a good question,haha”。将二者拼接后得到“Julia:Where are you?Hania:That’s a good question,haha”
步骤112,采用预训练模型的分词器将拼接后的对话内容进行分词,并保留第一预定数量的词作为模型的输入。
在本公开的一些实施例中,预训练模型的分词器可以为BART(Bidirectional and Auto-Regressive Transformers,双向自回归转换器)。
在本公开的一些实施例中,预训练模型可以为BART、STEP、PEGASUS(Pre-training with Extracted Gap-sentences for Abstractive Summarization,通过预训练为抽象摘要提取间隙句)等模型。
在本公开的一些实施例中,第一预定数量的词可以为前1024个词作。
步骤120,根据对话内容理解模型、子摘要生成模型和全文摘要生成模型的需求,构建相应不同形式的模型训练数据。
在本公开的一些实施例中,步骤120可以包括步骤121-步骤123中的至少一个步骤,其中:
步骤121,对于对话内容理解模型,根据对话内容中连续的对话语句构建的窗口作为正例模型训练数据,窗口内容的对话语句将顺序打乱后再重新拼接的数据作为负例模型训练数据。
在本公开的一些实施例中,步骤121,对于对话内容理解模型,根据对话内容来构造训练数据,不涉及摘要部分。
在本公开的一些实施例中,步骤121可以包括:将由k个连续的对话语句构建的窗口w作为正例,将窗口w内的语句顺序打乱后再重新拼接后作为负例。以图2中的对话为例说明,可以选取包含3个连续的对话语句的窗口“Hania:I will be there around 7pm I guess:(Julia:I will be waiting!:*Hania:Great!”作为正例,将窗口内的语句顺序打乱后再进行拼接得到“Hania:Great!Hania:I will be there around 7pm I guess:(Julia:I will be waiting!:*”,此时新得到的语句窗口作为负例。
步骤122,对于子摘要生成模型,针对对话内容的每个主题,生成对应的正例模型训练数据和负例模型训练数据。
在本公开的一些实施例中,步骤122可以包括:针对分主题子摘要生成模型,将联合对话内容和对话摘要一起构建训练数据。长对话内容的摘要总是由多个句子组成,每个句子都被视为一个子摘要。考虑到一个对话可能包含多个主题,本公开假设每个子摘要与一个主题相关。通过句子结束符(如句号,问号,感叹号等)将整个摘要分成单个句子,每一个单句则为一个子摘要。
在本公开的一些实施例中,为了方便描述,给定一段对话内容D=(u1,u2,...,u|D|),这里本公开将与此对话内容的相应目标摘要表示为TD=(t1,t2,...,tm),其中每个ti为子摘要, m为子摘要的个数,|D|表示对话内容中包含的句子个数。给定一个子摘要,本公开可以根据ROUGE-2召回分数从对话D中重新检索最相关的片段Sipos。具体地,给定一个整数窗口大小w∈[a,b](0<a≤b<|D|,a、b为超参数),我们可以将窗口滑过对话内容D,并以半个窗口大小w/2作为步长,从左向右滑动窗口,并获得一个候选集合片段的集合W。枚举集合W中的每个候选片段,并将其与子摘要计算ROUGE-2召回分数,我们可以得到得分最高的对话片段,它被选为与子摘要ti最相关的片段,并作为正样本Sipos。相应的反例是从集合W中其余的候选中随机挑选的,表示为Sineg。
在本公开的一些实施例中,本公开已经构建了一个训练数据{(Sipos,ti),(Sineg,ti)}。为简单起见,以图2对话中S2和t2的部分为例进行说明,即对话内容D={Hania:I will be there around 7pm I guess:(Julia:I will be waiting!:*Hania:Great.},|D|为3。本公开选定窗口大小w=2,步长为w/2=1,那么候选集合W中包含两个候选,分别为候选1“Hania:I will be there around 7pm I guess:(Julia:I will be waiting!:*”,候选2“Julia:I will be waiting!:*Hania:Great.”。针对图2下方绿色标示的子摘要“She will get there around 7pm.”,利用ROUGE-2召回分数得到候选1和该子摘要的得分最高,被选为正例。而候选2则作为负例。也就是说,与该子摘要ROUGE得分最高的对话内容是图中绿色标示的部分“Hania:I will be there around 7pm I guess:(Julia:I will be waiting!:*Hania:Great!”。基于此,绿色的对话内容(作为模型输入)以及与其相对应的子摘要(作为模型输入)构成一个正例训练数据。而要构造负例训练数据,则只需要任意选择一个其他的对话内容窗口作为模型输入,子摘要依然为模型输出。还是以图2中的对话为例说明,以对话内容“Julia:Where are you…Julia:I know how you feel love,I am sick of trains already:(”作为模型输入,子摘要“She will get there around 7pm.”作为模型输出,这种输入输出的搭配则构成了一个负例训练数据。
在本公开的一些实施例中,步骤122可以包括:在训练数据构建模块针对子摘要生成模块的训练数据构建时,选用Rouge-2召回得分来匹配子摘要和对话内容窗口。
在本公开的一些实施例中,采用自动文摘评测方法Rouge-2召回得分可替换为采用BertScore方式召回得分。
步骤123,对于全文摘要生成模型,将全部对话内容作为模型输入,输出为完整的摘要。
以图2中的对话为例说明,全部的对话内容“Julia:Where are you…Hania:Pizza always:D…”都作为输入,三句子摘要构成的对话摘要“Hania has been…pizza for her.” 则作为输出。
步骤130,构建对话内容理解模型,对对话内容进行语义编码,并对对话内容理解模型进行训练。
在本公开的一些实施例中,步骤130可以包括:采用BART的Transformer encoder(转换器编码器)的网络结构,将对话内容编码为语意向量。例如给定一段对话的内容(由训练数据构造模块得到)D,步骤130将理解此对话内容并输出该对话内容的语意表示向量E。
步骤140,构建连贯性检测模型,通过学习语义连贯性建模对话内容中不同主题之间的切换关系,得到对话内容的主题分割信息。
在本公开的一些实施例中,连贯性检测模型,用于检测输入的对话内容窗口内的对话的语意连贯性。
在本公开的一些实施例中,步骤140可以包括步骤141-步骤143中的至少一个步骤,其中:
步骤141,分别计算连贯性检测模型的正例模型训练数据和负例模型训练数据的连贯性得分。
在本公开的一些实施例中,步骤141可以包括:针对连贯性检测模型,步骤120和步骤130可以包括:基于原始的对话内容构造了正例Spos和负例Sneg;将正、负例数据输入对话内容理解模块得到各自的语意向量表示,分别记为如公式(1)所示;然后,通过公式(2)可以计算各自的连贯性得分co,并计算正例、负例的归一化的连贯性得分。

公式(1)中,w1,b1都是模型参数。本公开上述实施例通过公式(2)可以计算正例、负例的相关性得分,公式(2)利用softmax的归一化特性,计算正例、负例的归一化的连贯性得分。
步骤142,随机选择选择第二预定数量的预定正负例对,基于对比学习计算相干损失。
在本公开的一些实施例中,第二预定数量可以为Nco
在本公开的一些实施例中,步骤142可以包括:对于一个对话内容D,为了简单起见,在模型训练期间随机选择Nco个正负例对。然后计算基于对比学习的相干损失值如公式(3)所示。
公式(3)中,δco是一个边际系数,本公开希望正例片段的连贯性得分大于负片段的得分。k,Nco和δco都是超参数。
步骤143,基于边缘对比损失计算连贯性检测模型的目标函数。
在本公开的一些实施例中,步骤143可以包括:给定一个批次的数据,B=(<D1,TD1>,<D2,TD1>,…,<D|B|,TD|B|>),其中,Di为第i个对话,TDi为Di对应的摘要,|B|为该批次中数据的个数,由对话语句连贯性模块计算得到的基于边缘对比损失的目标函数如公式(4)所示。
步骤150,构建子摘要生成模型,为对话内容的每个主题生成对应的子摘要。
在本公开的一些实施例中,步骤150可以包括步骤151-步骤154中的至少一个步骤,其中:
步骤151,将子摘要生成任务建模为序列到序列的学习问题。
在本公开的一些实施例中,步骤151可以包括:根据公式(5)和公式(6)确定正例和负例输入情况下的负对数似然值。

公式(5)和公式(6)中,指的是ti个子摘要中第j个字,代表位置j之前的所有字。θ表示模型参数,分别表示正例、负例的输入。表示在前j个字的基础上生成个字的概率。遍历所有的j,并将所有的p概率值相乘,得到生成完整的ti子摘要的概率,再去负的log,得到生成ti子摘要的负对数损失函数L,即负对数似然。针对表示在输入为时,生成ti子摘要的负对数损失函数
步骤152,确定子摘要生成任务中对话摘要片段和子摘要的不相关程度。
在本公开的一些实施例中,步骤152可以包括:根据公式(7)确定对话摘要片段和子摘要的不相关程度,其中,softmax层之后的归一化分数可以看作是无关分数,本公开采用一化的分值来表示一个对话摘要片段和子摘要的不相关程度。
步骤153,在训练阶段,随机选择选第三预定数量的预定正负例对进行训练。
在本公开的一些实施例中,步骤153可以包括:给定一个对话D和其对应的摘要TD, 其中TD包含m个子摘要,至少可以构造m对正负例对作为训练数据,为了简化计算过程,在训练阶段,我们随机选择Nsu(Nsu<m)。
步骤154,根据基于对比学习的边际损失函数确定子摘要生成模型的目标函数。
在本公开的一些实施例中,步骤154可以包括:针对对话内容D,可以构建一个基于对比学习的边际损失函数,如公式(8)所示。
公式(8)中,δsu为边际系数,通过这个系数,本公开希望一个正片段和一个子摘要之间的相关性得分至少大于该子摘要和负片段的相关性得分。Nsu和δsu为超参数。
在本公开的一些实施例中,步骤154还可以包括:对于一个批次的对话摘要训练数据B=(<D1,TD1>,<D2,TD2>,…,<D|B|,TD|B|>),其中|B|为该批次中数据的个数,则根据公式(9)确定负对数似然目标函数。
步骤160,构建全文摘要生成模型,为对话内容生成全文对话摘要。
在本公开的一些实施例中,全文摘要生成模型,用于采用对话内容和对应完整摘要作为训练数据。
在本公开的一些实施例中,步骤160可以包括步骤161-步骤162中的至少一个步骤,其中:
步骤161,将全文摘要生成任务建模为一个序列到序列的学习问题。
步骤162,将全文摘要生成模型的训练目标设定为学习最优模型参数并最小化负对数似然函数值。
在本公开的一些实施例中,步骤162可以包括:给定一个对话内容D=(u1,u2,...,u|TD|),由|D|个对话语句组成,加上其对应的摘要TD=(y1,y2,...,y|TD|),由|TD|个子摘要组成,全文对话摘要生成的目标是学习最优模型参数θ并最小化以下负对数似然值,如公式(10)所示。
公式(10)中,yi|y1:i-1表示输出序列的第i-1个标记(即,yi|y1:i-1=(y1,y2,...,yi-1))。
步骤163,确定全文摘要生成模型的目标函数。
在本公开的一些实施例中,步骤163可以包括:对于某一批次的对话摘要对B=(〈D1,TD1〉,〈D2,TD2〉,...,〈D|B|,TD|B|〉),基于负对数似然的目标函数计算如公式(11)所示。
步骤170,采用交替参数更新的方式将对话语句连贯性检测模型、子摘要生成模型和全文摘要生成模型联合起来一起进行模型训练。
在本公开的一些实施例中,步骤170可以包括步骤171-步骤172中的至少一个步骤,其中:
步骤171,在训练过程中,采用对话语句连贯性检测模型的目标函数、子摘要生成模型的目标函数和全文摘要生成模型的目标函数依次更新参数,训练对话语句连贯性检测模型、子摘要生成模型和全文摘要生成模型。
步骤172,在训练过程中,两个基于对比学习的目标作为辅助任务,用于提升全文摘要生成模型生成摘要的质量。即,对话语句连贯性检测模型和子摘要生成模型作为辅助任务,可以有助于训练阶段的主要对话摘要任务。
在本公开的一些实施例中,在训练过程中,对话语句连贯性检测模块和子摘要生成模块作为辅助任务,用于帮助全文摘要生成模块提高生成摘要的质量。
在本公开的一些实施例中,训练结束后,这两个模型及其对应的数据预处理模块并不需要了。对新的对话内容进行摘要生产时(图1实施例的步骤200中),本公开只需要对话内容理解模块和全文摘要生成模块对对话内容进行理解和摘要生产即可。
在本公开的一些实施例中,本公开的主干网络结构是transformer network(转换器网络),该网络主要是用于理解输入文本并生成对应的摘要。
在本公开的一些实施例中,transformer network网络结构也可以采用基于RNN(Recurrent Neural Network,循环神经网络)和CNN(Convolutional Neural Networks,卷积神经网络)的端到端生成模型进行替代。
本公开上述实施例提供的对话摘要生成方法和模型训练方法,基于对比学习的思想,提出通过学习语义连贯性来建模对话内容中不同主题之间的切换关系,从而隐式地得到对话内容的主题分割信息,并且能够针对同一主题的对话内容进行摘要生成。本公开上述实施例的方法不需要额外的标注信息或先验算法。
本公开上述实施例提供的多任务学习的端到端的摘要生成方法,可以解决流水线式方法(即先得到对话主题信息,在根据主题信息进行摘要生成的方式)的错误传递缺点的技术问题。
在对话摘要生成场景中,针对对话主题信息获取难、标注代价昂贵的痛点,本公开上述实施例提出了基于对话内容连贯性的对比学习方式来隐式建模对话内容的主题信息。
针对对话内容的不同主题,本公开上述实施例引入子摘要概念并设计基于对比学习的子摘要生成方式,能够有效帮助模型在获取对话内容主题信息后,针对同一主题的内容进行子摘要的生成。
本公开上述实施例针对对话内容连贯性检测模块和子摘要生成模块的训练数据采用了正负例的构建方式。
本公开上述实施例联合两个对比式目标函数(即对话语句连贯性检测模块和子摘要生成模块的目标函数)和全文摘要生成模块的目标函数的多任务学习方式,能够支持模型端到端的进行对话摘要的生成,避免了错误传递。
图5为本公开对话摘要生成装置一些实施例的示意图。如图5所示,本公开对话摘要生成装置可以包括模型生成模块51和对话摘要确定模块52,其中:
模型生成模块51,用于采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型。
在本公开的一些实施例中,模型生成模块41可以实现为本公开的模型训练设备。
对话摘要确定模块52,用于将目标对话内容输入对话摘要生成模型,得到目标对话内容的摘要信息。
在本公开的一些实施例中,所述对话摘要生成装置用于执行实现如上述任一实施例所述的对话摘要生成方法的操作。
图6为本公开模型训练设备一些实施例的示意图。如图6所示,本公开模型训练设备(例如图5实施例的模型生成模块51)可以包括模型生成单元61和模型训练单元62,其中:
模型生成单元61,用于采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型。
模型训练单元62,用于采用交替参数更新的方式对对话摘要生成模型进行模型训练,使得对话摘要生成模型根据输入的目标对话内容,输出目标对话内容的摘要信息。
图7为本公开模型训练设备另一些实施例的示意图。如图7所示,本公开模型训练设备(例如图5实施例的模型生成模块51)可以包括对话内容预处理模块71、训练数据构建模块72、对话内容理解模块73、对话语句的连贯性检测模块74、分主题子摘要生成模块75、全文对话摘要生成模块76和多任务学习模块77,其中:
对话内容预处理模块71,用于对对话内容进行预处理。
在本公开的一些实施例中,对话内容预处理模块71可以用于将对话内容中不同说话者的说话内容添加说话人信息后,拼接在一起;采用预训练模型的分词器将拼接后的对话内容进行分词,并保留第一预定数量的词作为模型的输入。
训练数据构建模块72,用于根据对话内容理解模型、子摘要生成模型和全文摘要生成模型的需求,构建相应的模型训练数据。
在本公开的一些实施例中,训练数据构建模块72,可以用于对于对话内容理解模型,根据对话内容中连续的对话语句构建的窗口作为正例模型训练数据,窗口内容的对话语句将顺序打乱后再重新拼接的数据作为负例模型训练数据;对于子摘要生成模型,针对对话内容的每个主题,生成对应的正例模型训练数据和负例模型训练数据;对于全文摘要生成模型,将全部对话内容作为模型输入,输出为完整的摘要。
对话内容理解模块73,用于构建对话内容理解模型,对对话内容进行语义编码,并对对话内容理解模型进行训练。
对话语句的连贯性检测模块74,用于构建连贯性检测模型,通过学习语义连贯性建模对话内容中不同主题之间的切换关系,得到对话内容的主题分割信息。
在本公开的一些实施例中,对话语句的连贯性检测模块74,可以用于分别计算连贯性检测模型的正例模型训练数据和负例模型训练数据的连贯性得分;随机选择选择第二预定数量的预定正负例对,基于对比学习计算相干损失;基于边缘对比损失计算连贯性检测模型的目标函数。
分主题子摘要生成模块75,用于构建子摘要生成模型,为对话内容的每个主题生成对应的子摘要。
在本公开的一些实施例中,分主题子摘要生成模块75,可以用于将子摘要生成任务建模为序列到序列的学习问题;确定子摘要生成任务中对话摘要片段和子摘要的不相关程度;在训练阶段,随机选择选第三预定数量的预定正负例对进行训练;根据基于对比学习的边际损失函数确定子摘要生成模型的目标函数。
全文对话摘要生成模块76,用于构建全文摘要生成模型,为对话内容生成全文对话摘要。
在本公开的一些实施例中,全文对话摘要生成模块76,可以用于将全文摘要生成任务建模为一个序列到序列的学习问题;将全文摘要生成模型的训练目标设定为学习最优模型参数并最小化负对数似然函数值;确定全文摘要生成模型的目标函数。
多任务学习模块77,用于采用交替参数更新的方式将对话语句连贯性检测模型、子摘 要生成模型和全文摘要生成模型进行模型训练。
在本公开的一些实施例中,多任务学习模块77可以用于在训练过程中,采用对话语句连贯性检测模型的目标函数、子摘要生成模型的目标函数和全文摘要生成模型的目标函数依次更新参数,训练对话语句连贯性检测模型、子摘要生成模型和全文摘要生成模型;在训练过程中,对话语句连贯性检测模型和子摘要生成模型作为辅助任务,用于提升全文摘要生成模型生成摘要的质量。
在本公开的一些实施例中,所述对话模型训练设备用于执行实现如上述任一实施例所述的模型训练方法的操作。
本公开上述实施例提供的对话摘要生成装置和模型训练设备,基于对比学习的思想,提出通过学习语义连贯性来建模对话内容中不同主题之间的切换关系,从而隐式地得到对话内容的主题分割信息,并且能够针对同一主题的对话内容进行摘要生成。本公开上述实施例的对话摘要生成装置不需要额外的标注信息或先验算法。
本公开上述实施例提供的多任务学习的端到端的摘要生成装置,可以解决流水线式方式(即先得到对话主题信息,在根据主题信息进行摘要生成的方式)的错误传递缺点的技术问题。
图8为本公开计算机装置一些实施例的结构示意图。如图8所示,计算机装置包括存储器81和处理器82。
存储器81用于存储指令,处理器82耦合到存储器81,处理器82被配置为基于存储器存储的指令执行实现上述实施例所述的对话摘要生成方法、和/或如上述任一实施例所述的模型训练方法。
如图8所示,该计算机装置还包括通信接口83,用于与其它设备进行信息交互。同时,该计算机装置还包括总线84,处理器82、通信接口83、以及存储器81通过总线84完成相互间的通信。
存储器81可以包含高速RAM存储器,也可还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。存储器81也可以是存储器阵列。存储器81还可能被分块,并且块可按一定的规则组合成虚拟卷。
此外,处理器82可以是一个中央处理器CPU,或者可以是专用集成电路ASIC,或是被配置成实施本公开实施例的一个或多个集成电路。
在对话摘要生成场景中,针对对话主题信息获取难、标注代价昂贵的痛点,本公开上 述实施例提出了基于对话内容连贯性的对比学习方式来隐式建模对话内容的主题信息。
针对对话内容的不同主题,本公开上述实施例引入子摘要概念并设计基于对比学习的子摘要生成方式,能够有效帮助模型在获取对话内容主题信息后,针对同一主题的内容进行子摘要的生成。
本公开上述实施例针对对话内容连贯性检测模块和子摘要生成模块的训练数据采用了正负例的构建方式。
本公开上述实施例联合两个对比式目标函数(即对话语句连贯性检测模块和子摘要生成模块的目标函数)和全文摘要生成模块的目标函数的多任务学习方式,能够支持模型端到端的进行对话摘要的生成,避免了错误传递。
根据本公开的另一方面,提供一种非瞬时性计算机可读存储介质,其中,所述非瞬时性计算机可读存储介质存储有计算机指令,所述指令被处理器执行时实现如上述任一实施例所述的对话摘要生成方法、和/或如上述任一实施例所述的模型训练方法。
本领域内的技术人员应明白,本公开的实施例可提供为方法、装置、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开实施例的方法、设备(***)和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他 可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在上面所描述的对话摘要生成装置可以实现为用于执行本申请所描述功能的通用处理器、可编程逻辑控制器(PLC)、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件或者其任意适当组合。
至此,已经详细描述了本公开。为了避免遮蔽本公开的构思,没有描述本领域所公知的一些细节。本领域技术人员根据上面的描述,完全可以明白如何实施这里公开的技术方案。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指示相关的硬件完成,所述的程序可以存储于一种非瞬时性计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
本公开的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本公开限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本公开的原理和实际应用,并且使本领域的普通技术人员能够理解本公开从而设计适于特定用途的带有各种修改的各种实施例。

Claims (25)

  1. 一种对话摘要生成方法,包括:
    采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型;
    将目标对话内容输入对话摘要生成模型,得到目标对话内容的摘要信息。
  2. 根据权利要求1所述的对话摘要生成方法,其中,所述采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型包括:
    构建连贯性检测模型,通过学习语义连贯性建模对话内容中不同主题之间的切换关系,得到对话内容的主题分割信息;
    构建子摘要生成模型,为对话内容的每个主题生成对应的子摘要;
    构建全文摘要生成模型,为对话内容生成全文对话摘要。
  3. 根据权利要求2所述的对话摘要生成方法,其中,所述采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型还包括:
    采用交替参数更新的方式将对话语句连贯性检测模型、子摘要生成模型和全文摘要生成模型进行模型训练。
  4. 根据权利要求3所述的对话摘要生成方法,其中,所述采用交替参数更新的方式将对话语句连贯性检测模型、子摘要生成模型和全文摘要生成模型进行模型训练包括:
    在训练过程中,采用对话语句连贯性检测模型的目标函数、子摘要生成模型的目标函数和全文摘要生成模型的目标函数依次更新参数,训练对话语句连贯性检测模型、子摘要生成模型和全文摘要生成模型;
    在训练过程中,对话语句连贯性检测模型和子摘要生成模型作为辅助任务,用于提升全文摘要生成模型生成摘要的质量。
  5. 根据权利要求2-4中任一项所述的对话摘要生成方法,其中,所述采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型还包括:
    对对话内容进行预处理;
    根据对话内容理解模型、子摘要生成模型和全文摘要生成模型的需求,构建相应的模型训练数据;
    构建对话内容理解模型,对对话内容进行语义编码,并对对话内容理解模型进行训练。
  6. 根据权利要求5所述的对话摘要生成方法,其中,所述对对话内容进行预处理包括:
    将对话内容中不同说话者的说话内容添加说话人信息后,拼接在一起;
    采用预训练模型的分词器将拼接后的对话内容进行分词,并保留第一预定数量的词作为模型的输入。
  7. 根据权利要求5所述的对话摘要生成方法,其中,所述根据对话内容理解模型、子摘要生成模型和全文摘要生成模型的需求,构建相应的模型训练数据包括:
    对于对话内容理解模型,根据对话内容中连续的对话语句构建的窗口作为正例模型训练数据,窗口内容的对话语句将顺序打乱后再重新拼接的数据作为负例模型训练数据;
    对于子摘要生成模型,针对对话内容的每个主题,生成对应的正例模型训练数据和负例模型训练数据;
    对于全文摘要生成模型,将全部对话内容作为模型输入,输出为完整的摘要。
  8. 根据权利要求6所述的对话摘要生成方法,其中,所述构建连贯性检测模型,通过学习语义连贯性建模对话内容中不同主题之间的切换关系,得到对话内容的主题分割信息包括:
    分别计算连贯性检测模型的正例模型训练数据和负例模型训练数据的连贯性得分;
    在训练阶段,随机选择选择第二预定数量的预定正负例对,基于对比学习计算相干损失;
    基于边缘对比损失计算连贯性检测模型的目标函数。
  9. 根据权利要求6所述的对话摘要生成方法,其中,所述构建子摘要生成模型,为对话内容的每个主题生成对应的子摘要包括:
    将子摘要生成任务建模为序列到序列的学习问题;
    确定子摘要生成任务中对话摘要片段和子摘要的不相关程度;
    在训练阶段,随机选择选第三预定数量的预定正负例对进行训练;
    根据基于对比学习的边际损失函数确定子摘要生成模型的目标函数。
  10. 根据权利要求6所述的对话摘要生成方法,其中,所述构建全文摘要生成模型,为对话内容生成全文对话摘要包括:
    将全文摘要生成任务建模为一个序列到序列的学习问题;
    将全文摘要生成模型的训练目标设定为学习最优模型参数并最小化负对数似然函数值;
    确定全文摘要生成模型的目标函数。
  11. 一种模型训练方法,包括:
    采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型;
    采用交替参数更新的方式对对话摘要生成模型进行模型训练,使得对话摘要生成模型根据输入的目标对话内容,输出目标对话内容的摘要信息。
  12. 根据权利要求11所述的模型训练方法,其中,所述采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型包括:
    构建连贯性检测模型,通过学习语义连贯性建模对话内容中不同主题之间的切换关系,得到对话内容的主题分割信息;
    构建子摘要生成模型,为对话内容的每个主题生成对应的子摘要;
    构建全文摘要生成模型,为对话内容生成全文对话摘要。
  13. 根据权利要求12所述的模型训练方法,其中,所述采用交替参数更新的方式对对话摘要生成模型进行模型训练包括:
    在训练过程中,采用对话语句连贯性检测模型的目标函数、子摘要生成模型的目标函数和全文摘要生成模型的目标函数依次更新参数,训练对话摘要生成模型,其中,对话摘要生成模型包括对话语句连贯性检测模型、子摘要生成模型和全文摘要生成模型;
    在训练过程中,对话语句连贯性检测模型和子摘要生成模型作为辅助任务,用于提 升全文摘要生成模型生成摘要的质量。
  14. 根据权利要求12或13所述的模型训练方法,其中,所述采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型还包括:
    对对话内容进行预处理;
    根据对话内容理解模型、子摘要生成模型和全文摘要生成模型的需求,构建相应的模型训练数据;
    构建对话内容理解模型,对对话内容进行语义编码,并对对话内容理解模型进行训练。
  15. 根据权利要求14所述的模型训练方法,其中,所述对对话内容进行预处理包括:
    将对话内容中不同说话者的说话内容添加说话人信息后,拼接在一起;
    采用预训练模型的分词器将拼接后的对话内容进行分词,并保留第一预定数量的词作为模型的输入。
  16. 根据权利要求14所述的模型训练方法,其中,所述根据对话内容理解模型、子摘要生成模型和全文摘要生成模型的需求,构建相应的模型训练数据包括:
    对于对话内容理解模型,根据对话内容中连续的对话语句构建的窗口作为正例模型训练数据,窗口内容的对话语句将顺序打乱后再重新拼接的数据作为负例模型训练数据;
    对于子摘要生成模型,针对对话内容的每个主题,生成对应的正例模型训练数据和负例模型训练数据;
    对于全文摘要生成模型,将全部对话内容作为模型输入,输出为完整的摘要。
  17. 根据权利要求15所述的模型训练方法,其中,所述构建连贯性检测模型,通过学习语义连贯性建模对话内容中不同主题之间的切换关系,得到对话内容的主题分割信息包括:
    分别计算连贯性检测模型的正例模型训练数据和负例模型训练数据的连贯性得分;
    在训练阶段,随机选择选择第二预定数量的预定正负例对,基于对比学习计算相干损失;
    基于边缘对比损失计算连贯性检测模型的目标函数。
  18. 根据权利要求15所述的模型训练方法,其中,所述构建子摘要生成模型,为对话内容的每个主题生成对应的子摘要包括:
    将子摘要生成任务建模为序列到序列的学习问题;
    确定子摘要生成任务中对话摘要片段和子摘要的不相关程度;
    在训练阶段,随机选择选第三预定数量的预定正负例对进行训练;
    根据基于对比学习的边际损失函数确定子摘要生成模型的目标函数。
  19. 根据权利要求15所述的模型训练方法,其中,所述构建全文摘要生成模型,为对话内容生成全文对话摘要包括:
    将全文摘要生成任务建模为一个序列到序列的学习问题;
    将全文摘要生成模型的训练目标设定为学习最优模型参数并最小化负对数似然函数值;
    确定全文摘要生成模型的目标函数。
  20. 一种对话摘要生成装置,包括:
    模型生成模块,用于采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型;
    对话摘要确定模块,用于将目标对话内容输入对话摘要生成模型,得到目标对话内容的摘要信息。
  21. 根据权利要求20所述的对话摘要生成装置,其中,所述对话摘要生成装置用于执行实现如权利要求1-10中任一项所述的对话摘要生成方法的操作。
  22. 一种模型训练设备,包括:
    模型生成单元,用于采用对比学习的方式对语义连贯性进行建模,确定对话主题信息,生成对话摘要生成模型;
    模型训练单元,用于采用交替参数更新的方式对对话摘要生成模型进行模型训练,使得对话摘要生成模型根据输入的目标对话内容,输出目标对话内容的摘要信息。
  23. 根据权利要求22所述的模型训练设备,其中,所述对话模型训练设备用于执行实现如权利要求11-19中任一项所述的模型训练方法的操作。
  24. 一种计算机装置,包括:
    存储器,用于存储指令;
    处理器,用于执行所述指令,使得所述计算机装置执行实现如权利要求1-10中任一项所述的对话摘要生成方法、和/或如权利要求11-19中任一项所述的模型训练方法的操作。
  25. 一种非瞬时性计算机可读存储介质,其中,所述非瞬时性计算机可读存储介质存储有计算机指令,所述指令被处理器执行时实现如权利要求1-10中任一项所述的对话摘要生成方法、和/或如权利要求11-19中任一项所述的模型训练方法。
PCT/CN2023/084642 2022-04-01 2023-03-29 对话摘要生成方法和装置、模型训练方法和设备 WO2023185912A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210338218.2 2022-04-01
CN202210338218.2A CN116933801A (zh) 2022-04-01 2022-04-01 对话摘要生成方法和装置、模型训练方法和设备

Publications (1)

Publication Number Publication Date
WO2023185912A1 true WO2023185912A1 (zh) 2023-10-05

Family

ID=88199301

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/084642 WO2023185912A1 (zh) 2022-04-01 2023-03-29 对话摘要生成方法和装置、模型训练方法和设备

Country Status (2)

Country Link
CN (1) CN116933801A (zh)
WO (1) WO2023185912A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368066A (zh) * 2018-12-06 2020-07-03 北京京东尚科信息技术有限公司 获取对话摘要的方法、装置和计算机可读存储介质
CN111639175A (zh) * 2020-05-29 2020-09-08 电子科技大学 一种自监督的对话文本摘要方法及***
CN112464658A (zh) * 2020-12-07 2021-03-09 上海交通大学 基于语句融合的文本摘要生成方法、***、终端及介质
CN113919367A (zh) * 2021-09-09 2022-01-11 中国科学院自动化研究所 摘要获取方法、装置、设备、介质及产品
US20220067269A1 (en) * 2020-08-31 2022-03-03 Twilio Inc. Language model for abstractive summarization

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368066A (zh) * 2018-12-06 2020-07-03 北京京东尚科信息技术有限公司 获取对话摘要的方法、装置和计算机可读存储介质
CN111639175A (zh) * 2020-05-29 2020-09-08 电子科技大学 一种自监督的对话文本摘要方法及***
US20220067269A1 (en) * 2020-08-31 2022-03-03 Twilio Inc. Language model for abstractive summarization
CN112464658A (zh) * 2020-12-07 2021-03-09 上海交通大学 基于语句融合的文本摘要生成方法、***、终端及介质
CN113919367A (zh) * 2021-09-09 2022-01-11 中国科学院自动化研究所 摘要获取方法、装置、设备、介质及产品

Also Published As

Publication number Publication date
CN116933801A (zh) 2023-10-24

Similar Documents

Publication Publication Date Title
Chung et al. Speech2vec: A sequence-to-sequence framework for learning word embeddings from speech
Wang et al. Machine comprehension using match-lstm and answer pointer
Ray Chowdhury et al. Keyphrase extraction from disaster-related tweets
CN112131366A (zh) 训练文本分类模型及文本分类的方法、装置及存储介质
US20170351663A1 (en) Iterative alternating neural attention for machine reading
US10891539B1 (en) Evaluating content on social media networks
CN115048944B (zh) 一种基于主题增强的开放域对话回复方法及***
EP3566151A1 (en) Generating responses in automated chatting
EP3563302A1 (en) Processing sequential data using recurrent neural networks
CN110781302A (zh) 文本中事件角色的处理方法、装置、设备及存储介质
WO2020191828A1 (zh) 基于图的上下文关联回复生成方法、计算机及介质
CN109190109A (zh) 融合用户信息生成评论摘要的方法及装置
CN110895656A (zh) 一种文本相似度计算方法、装置、电子设备及存储介质
CN116186216A (zh) 基于知识增强和双图交互的问题生成方法及***
CN115438149A (zh) 一种端到端模型训练方法、装置、计算机设备及存储介质
Baloglu et al. Assessment of supervised learning algorithms for irony detection in online social media
Song et al. SUNET: Speaker-utterance interaction graph neural network for emotion recognition in conversations
Kondurkar et al. Modern applications with a focus on training chatgpt and gpt models: Exploring generative ai and nlp
WO2023185912A1 (zh) 对话摘要生成方法和装置、模型训练方法和设备
CN117076608A (zh) 一种基于文本动态跨度的整合外部事件知识的脚本事件预测方法及装置
Illendula et al. Which emoji talks best for my picture?
WO2023040545A1 (zh) 一种数据处理方法、装置、设备、存储介质和程序产品
Atri et al. Promoting Topic Coherence and Inter-Document Consorts in Multi-Document Summarization via Simplicial Complex and Sheaf Graph
Thanarattananakin et al. Spam detection using word embedding-based LSTM
Sugiyama Empirical feature analysis for dialogue breakdown detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23778277

Country of ref document: EP

Kind code of ref document: A1