WO2021150313A1

WO2021150313A1 - Contrastive learning for question answering (qa)

Info

Publication number: WO2021150313A1
Application number: PCT/US2020/064144
Authority: WO
Inventors: Ming GONG; Ze YANG; Linjun SHOU; Daxin Jiang
Original assignee: Microsoft Technology Licensing, Llc
Priority date: 2020-01-20
Filing date: 2020-12-10
Publication date: 2021-07-29
Also published as: CN113139119A

Abstract

The present disclosure relates to contrastive learning for question answering (QA), and proposes methods and apparatuses for providing contrastive training data. A positive example may be obtained from a training data set, the positive example comprising a first text and a second text labelled as relevant. Contrastive information may be extracted from a search log. The first text may be amended based at least on the contrastive information. The amended first text and the second text may be combined into a negative example which is contrastive to the positive example, the amended first text and the second text being labelled as irrelevant in the negative example.

Description

CONTRASTIVE LEARNING FOR QUESTION ANSWERING (QA)

BACKGROUND

[0001] A search engine may provide search results for a user query in a search result page (SERP). Traditional search results include links to the most relevant web documents with respect to the user query. Herein, a web document may also be referred to as, e.g., a web page. A link may refer to a hyperlink, a web address, a URL, etc. In order to find answers relevant to the query, a user needs to view search results, click on links to web documents, and browse the presented web documents. In recent years, some web search engines begin to provide a question answering (QA) service, which is also referred to as a web QA service. The QA service provides a more efficient information access mechanism, which extracts the most relevant passage from a web document and directly presents the content of the passage to a user. For example, if a user query has question intent, a web search engine will extract the most relevant passage from a web document, and place the passage within an individual QA block in a SERP. The passage may refer to one or more sentences, one or more passages, abstract, etc., extracted from the corresponding web document. The QA service is becoming more and more popular for search engine users and is becoming an important service provided by search engines.

SUMMARY

[0002] This Summary is provided to introduce a selection of concepts that are further described below in the Detailed Description. It is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

[0003] Embodiments of the present disclosure propose methods and apparatuses for providing contrastive training data. A positive example may be obtained from a training data set, the positive example comprising a first text and a second text labelled as relevant. Contrastive information may be extracted from a search log. The first text may be amended based at least on the contrastive information. The amended first text and the second text may be combined into a negative example which is contrastive to the positive example, the amended first text and the second text being labelled as irrelevant in the negative example.

[0004] It should be noted that the above one or more aspects comprise the features hereinafter fully described and particularly pointed out in the claims. The following description and the drawings set forth in detail certain illustrative features of the one or more aspects. These features are only indicative of the various ways in which the principles of various aspects may be employed, and this disclosure is intended to include all such aspects and their equivalents.

BRIEF DESCRIPTION OF THE DRAWINGS

[0005] The disclosed aspects will hereinafter be described in connection with the appended drawings that are provided to illustrate and not to limit the disclosed aspects. [0006] FIG.l illustrates an exemplary search result page.

[0007] FIG.2 illustrates an exemplary process for providing contrastive training data according to an embodiment.

[0008] FIG.3 illustrates an exemplary process for providing contrastive training data through a Web Knowledge based Method (WKM) according to an embodiment.

[0009] FIG.4 illustrates an exemplary process for extracting candidate options according to an embodiment.

[0010] FIG.5 illustrates exemplary semi-structured data according to an embodiment. [0011] FIG.6 illustrates an exemplary process for providing contrastive training data through a User Feedback based Method (UFM) according to an embodiment.

[0012] FIG.7 illustrates a flowchart of an exemplary method for providing contrastive training data according to an embodiment.

[0013] FIG.8 illustrates an exemplary apparatus for providing contrastive training data according to an embodiment.

[0014] FIG.9 illustrates an exemplary apparatus for providing contrastive training data according to an embodiment.

DETAILED DESCRIPTION

[0015] The present disclosure will now be discussed with reference to several example implementations. It is to be understood that these implementations are discussed only for enabling those skilled in the art to better understand and thus implement the embodiments of the present disclosure, rather than suggesting any limitations on the scope of the present disclosure.

[0016] Generally, a QA system for providing the Web QA service may be configured in a search engine, to provide a passage most relevant to a query in a SERP. The QA system may include a QA model, which is also referred to as a QA relevance model. The QA model is used for, for each candidate passage, providing a relevance score between the candidate passage and the query. Therefore, the QA system may select the passage most relevant to the query based on a relevance score of each candidate passage, and present the passage to the user in a QA block in the SERP.

[0017] Web QA may be understood as a text matching task, and the text matching may broadly refer to techniques used for identifying whether a pair of texts is semantically relevant. Some conventional approaches adopt an information retrieval (IR) model as a QA model to provide the QA service. The IR model may include, e.g., vector space model, BM 25 model, language model for IR, etc. In other approaches, in order to deal with the diversity of user queries, a neural network model is adopted as a QA model. The neural network model may also refer to deep learning model, deep neural network model, etc. The neural network model encodes semantic meaning of a query’s text into a vector. By mapping similar expressions to close positions in the vector space, the neural network model may recall a passage relevant to the query more accurately. Moreover, a deep pre training approach may also be adopted for further improving the performance of the neural network model through context embedding.

[0018] The neural network model captures the semantic similarity among texts based on distributional hypothesis, e.g., it would deem that linguistic items with similar distributions have similar meanings. Consequently, although the neural network model may successfully learn that "kid" is similar to "children", it may, at the same time, also consider that "the elderly" is similar to "kid" and "children", since the set of words that often co-occur with "the elderly" may also likely appear in the context of "kid" or "children". In the case of performing word embedding through word2vec, taking the word "adult" as an example, the closest words in the vector space may include "youth", "children", "the elderly", etc. It can be seen that the word embedding technology not only clusters synonyms, but also easily clusters other words in the same category in the vector space. Even if deep context embedding is applied as in the deep pre-training approach, the neural network model may still consider that, e.g., "children" and "the elderly" are similar, since contexts of these two words usually overlap with each other.

[0019] Therefore, the neural network model may not have sufficient sensitivity to distinguish words that have relevant attributes or categories but have dissimilar meanings. For example, the two words "children" and "the elderly" are both related to the category "person" or to the attribute "person", but the meanings of them are not similar. In the scenario of the web QA, lack of such sensitivity may cause poor user experiences. For example, if a user query is "cold treatment for children", a search engine may provide a passage about how to treat children's cold, which is relevant to the query, i.e., a good answer to the query. However, when a user query is "cold treatment for the elderly", since the neural network model cannot effectively distinguish between "the elderly" and "children", the search engine may still provide the passage about how to treat children's cold, which would be irrelevant to the query, i.e., an inappropriate answer to the query. [0020] Generally, adversarial training is applied for the neural network model. The adversarial training aims to make small perturbations that do not change the meaning of an input text will not cause significant changes to an output by the model. For example, an adversarial instance may be generated by amending words in the original text. However, the adversarial training can only enhance the robustness of the model, but cannot be used for improving the sensitivity of the model.

[0021] Embodiments of the present disclosure propose to enhance sensitivity of a QA model through contrastive learning, e.g., sensitivity of a neural network model used for a web QA task, so that the model can have a capability of effectively distinguishing words that have relevant attributes or categories but dissimilar meanings. Herein, the contrastive learning may include, e.g., automatically constructing or providing contrastive training data for enhancing the model’s sensitivity, and training the model with the automatically constructed contrastive training data. Moreover, herein, the "contrastive" is used for describing relationship between two texts, e.g., two contrastive texts may refer to that these two texts are relevant in attributes or categories but not similar in meaning. For example, "children" and "the elderly" are two contrastive words.

[0022] Training data set used for training the QA model may include training data in the form of <q, p, label>, wherein q denotes a query or question, p denotes a passage, and "label" indicates the relevance between q and p. When q and p are labelled as relevant, the <q, p> pair may be considered as a positive example, and when q and p are labelled as irrelevant, the <q, p> pair may be considered as a negative example. Given a positive example <qi, p\> in the training data set, wherein q_\ and pi are labelled as relevant, the embodiments of the present disclosure may automatically generate a contrastive query qi' of qi, and construct a negative example <q\ p\> contrastive to the positive example <qi, pi>, wherein qi and pi are labelled as irrelevant. In this example, since qi and qi are contrastive, qi deviates from q_\ in terms of meaning. The constructed negative example may be added to the training data set to be used for training the QA model. As a specific example, assuming that qi is "cold treatment for children" and pi is a passage about how to treat children's cold, the embodiments of the present disclosure may construct qi! as, e.g., "cold treatment for the elderly", and accordingly form a negative example composed of the query "cold treatment for the elderly" and the passage about how to treat children's cold. Through generating negative examples that are contrastive to positive examples in the training data set, the QA model can not only learn from the positive examples what information should be associated together, but also learn from the constructed contrastive negative examples what information should be distinguished, e.g., distinguishing words that are relevant in attributes or categories but not similar in meaning.

[0023] The embodiments of the present disclosure may mine contrastive information from a search log, and construct negative examples that are contrastive to positive examples in the training data set with the contrastive information. Two unsupervised methods for automatically constructing contrastive training data are proposed, e.g., Web Knowledge based Method (WKM) and User Feedback based Method (UFM). The WKM may generate contrastive training data with a contrastive word pair set, which is mined from the search log, through word replacement. For example, the WKM may obtain candidate options from the search log, cluster the candidate options into multiple groups at least with a semi-structured data corpus collected on the web, and form contrastive word pairs from candidate options included in each group. The WKM may amend a query in a positive example with the mined contrastive word pair set, e.g., replacing words in the query to form a negative example. The UFM may select a contrastive query that is contrastive to a query in a positive example based at least on search records in the search log, e.g., displayed links, user click behaviors, etc., and form a negative example with the selected contrastive query.

[0024] It should be understood that the embodiments of the present disclosure are not limited to construct contrastive <query, passage> pairs in the QA scenario, but may be more broadly used for constructing contrastive text pairs <text G, text 2> for various types of text pairs <text 1, text 2> in other application scenarios. For example, a search engine may need to determine whether two queries have the same meaning, or any other model may need to determine whether two texts have the same meaning, therefore, through constructing training text pairs acting as negative examples that are contrastive to training text pairs acting as positive examples according to the embodiments of the present disclosure, the model’s sensitivity in determining textual meaning similarity may be enhanced. Moreover, for example, when a search engine needs to classify queries or any other model needs to classify texts, contrastive training data constructed according to the embodiments of the present invention may facilitate to improve the model’s ability of recognizing changes in meanings caused by word changes in texts, thereby enhancing the model’s sensitivity. Therefore, although the construction of contrastive query-passage pair training data is taken as an example in some parts of the following discussion, the same or similar process may also be applied for scenarios of constructing any other contrastive text pair training data.

[0025] FIG.l illustrates an exemplary search result page (SERP) 100. The SERP 100 may be presented to a user in a user interface by a search engine in response to the user's query or question. Components in the SERP 100 may be exemplarily divided into a search block 110, a QA block 120, a relevant question block 130, a web page link block 140, etc. Here, the blocks are only different logical divisions of the components in the SERP 100, and in terms of display and function, different blocks and components therein may be independent from or combined with each other.

[0026] In the search block 110, the user may enter a query, e.g., "summer flu treatment".

[0027] In response to determining that the user input in the search block 110 has question intent, the search engine may provide the QA block 120 in the SERP 100. The QA block 120 may include, e.g., a passage 122 for answering the user query, an extension option 124 of the passage 122, a source page link 126 of the passage 122, etc. The passage 122 is content that is extracted from a web document and is most relevant to the user query. For example, in FIG.l, the passage 122 may include multiple tips for treating summer cold. Due to the limitation of display size of a page, the passage 122 may only be partially displayed. In this case, the user may click on the extension option 124, e.g., a "More items" link, to view the hidden parts of the passage 122. The source page link 126 is a hyperlink to a source page or a source web document from which the passage 122 is extracted. When the user clicks on the source page link 126, the source page of the passage 122 may be presented in the user interface. Moreover, optionally, the SERP 100 may further include a feedback button or link 128 for collecting satisfaction feedbacks provided by the user for the passage 122.

[0028] The relevant question block 130 may include questions relevant to or similar to the user query in the search block 110. These relevant questions may include, e.g., queries frequently searched by other users. In FIG.l, multiple questions relevant to the user query "summer flu treatment" are shown in the relevant question block 130, e.g., "What causes summer flu?", "Medicines for summer flu?", etc. When the user clicks on a relevant question, the search engine may initiate a search for the clicked relevant question and present a corresponding SERP in the user interface.

[0029] The web page link block 140 includes hyperlinks to web pages or web documents relevant to the user query in the search block 110. The web page links in the web page link block 140 may be ranked by the search engine based on document relevance. When the user clicks on a web page link, the web page may be presented in the user interface.

[0030] It should be understood that all the blocks and components in the SERP 100 in FIG.1 are exemplary, and according to specific designs and application requirements, the SERP 100 may include more or fewer blocks and components, and these blocks and components may be laid out and presented in any other approaches.

[0031] FIG.2 illustrates an exemplary process 200 for providing contrastive training data according to an embodiment. The process 200 may be performed for automatically generating a contrastive negative example for a text pair acting as a positive example in a training data set 210, so as to expand training data in the training data set 210.

[0032] The training data set 210 may include training data for training a target model. The target model may be various models related to prediction of text relevance, e.g., a QA model in a QA system, a text meaning comparison model, a text classification model, etc. The training data in the training data set 210 may take the form of <text 1, text 2, label>, wherein the "label" indicates the relevance between text 1 and text 2. When being labelled as "relevant", the text pair may be considered as a positive example, and when being labelled as "irrelevant", the text pair may be considered as a negative example.

[0033] In the process 200, assuming that an exemplary positive example 212 "<text 1, text 2, relevant>" is taken from the training data set 210, in which text 1 and text 2 are labelled as relevant, and assuming that it is expected to generate a negative example that is contrastive to the positive example 212.

[0034] In practical applications, there may be a large amount of interactions between users and a search engine, and information related to these interactions may be stored in a search log 220. The search log 220 may include a search record for each query. A search record may include various types of information in a SERP provided in response to a query, e.g., the query, a passage provided for the query, web links provided for the query, etc. The search record may also include various user behaviors on the SERP, e.g., clicking on a web page link by a user, etc.

[0035] At 230, contrastive information 240 may be extracted from the search log 240. Herein, the contrastive information may refer to various types of information that facilitate to generate contrastive training data. In an implementation, the contrastive information 240 may include a contrastive word pair set which may be used for generating contrastive training data in the WKM. In an implementation, the contrastive information 240 may include contrastive queries which may be used for generating contrastive training data in the UFM.

[0036] At 250, text 1 in the positive example 212 may be amended based at least on the contrastive information 240. Text 1 may be amended in different approaches. For example, words in text 1 may be replaced by words in the contrastive word pair set. For example, text 1 may be directly replaced by a contrastive query.

[0037] The amended text 1 may be used for forming a negative example 252 that is contrastive to the positive example 212. For example, the amended text 1 and text 2 in the positive example 212 may be combined into a text pair <am ended text 1, text 2>. Since the amended text 1 is contrastive to text 1, the amended text 1 and text 2 may be labelled as irrelevant. Accordingly, the negative example 252 "<amended text 1, text 2, irrelevant" is formed. The negative example 252 may act as contrastive training data that is contrastive to the positive example 212.

[0038] Through repeatedly performing the process 200, a contrastive training data set including a plurality of contrastive training data may be automatically obtained.

[0039] The process 200 may further add the contrastive training data in the contrastive training data set to the training data set 210. The contrastive training data set may be added to the training data set 210 in different approaches. In an implementation, the contrastive training data set may be simply appended to the training data set 210 as additional training data in addition to the original training data in the training data set 210. In an implementation, the training data set 210 may be updated by using at least a portion of the contrastive training data in the contrastive training data set to replace a portion of the original negative examples in the training data set 210, so as to ensure a balance between the number of positive examples and the number of negative examples in the training data set while guiding sensitivity training and ensuring the original accuracy of the model. Negative examples in the updated training data set may be configured according to the following equation:

X_n(_.X_n>X^*) = sample(A_n, a|Z_n|) U sample^^*, (1 - a) \X_n\) Equation (1) wherein X_n is the original negative example set in the training data set, X^* is the contrastive training data set generated by the process 200, X_n(·) is the final negative example set in the updated training data set, sample(L, K) is a sampling function for sampling K instances from the source L, \X_n\ denotes the number of negative examples in X_n , and a is a sampling coefficient which is used for controlling a ratio of instances selected from the original negative examples and from the contrastive training data. For example, when a=0.2, this means that 20% of the original negative examples in the training data set are replaced by the contrastive training data generated by the process 200. [0040] It should be understood that all the operations, steps, and their sequences in the process 200 are exemplary, and according to specific application requirements and designs, the process 200 may be amended in any approaches. For example, although the process 200 only shows generating one negative example 252 for the positive example 212, it is also possible to perform multiple amendments to text 1 at 250 with different contrastive information, and obtain different versions of the amended text 1, thereby, multiple negative examples may be generated for the positive example 212. Moreover, depending on a task to be performed by the trained model, the text pair <text 1, text 2> directed by the process 200 may be any text pair involving text relevance prediction, e.g., a <query, passage> pair in the QA scenario, a <query, query> pair in the scenario of comparing meanings of queries by a search engine, a <sentence, sentence> pair or a <word, word> pair in the scenario of comparing meanings of texts by a general language model, etc. [0041] FIG.3 illustrates an exemplary process 300 for providing contrastive training data through a WKM according to an embodiment. The process 300 is an exemplary specific implementation of the process 200 in FIG.2. For example, the process of mining a contrastive word pair set from a search log in FIG.3 may be regarded as a specific implementation of the process of extracting the contrastive information from the search log in FIG.2.

[0042] The search log 310 may include records of search sessions of users. A search session may refer to a search process for one or more interrelated queries established between a search engine and a user. A search session involving more than one query may be referred to as a multi-turn search session. In a multi-turn search session, the search engine may first receive an initial query entered by a user, i.e., a first-turn query. After presenting a SERP for the first-turn query, the user may desire to amend the first-turn query in order to obtain further information, and thus initiate a second-turn query associated with the first-turn query. After obtaining the second-turn query, the search engine may perform search and return a SERP for the second-turn query. Similarly, the user may continue to provide further queries. Since a series of queries in a multi-turn search session will be good candidates for generating contrastive word pairs, the process 300 may generate a contrastive word pair set based on multi -turn search sessions in the search log.

[0043] At 320, candidate option extraction may be performed in the search log 310. For example, at least one multi -turn search session may be first extracted from the search log. In an implementation, the at least one multi-turn search session may have the same first-turn query. Candidate options may be extracted from the extracted at least one multi turn search session. A candidate option may refer to a candidate that may act as a word in a contrastive word pair set. It should be understood that, herein, the "word" may broadly refer to character, word, phrase, etc. As an example, at 320, candidate option 1, candidate option 2, candidate option 3, ..., etc. may be extracted from the search log 310. FIG.4 illustrates an exemplary process 400 for extracting candidate options according to an embodiment.

[0044] A search log 410 in FIG.4 may correspond to the search log 310 in FIG.3. Assuming that a plurality of multi-turn search sessions including the same first-turn search "diabetes" are extracted from the search log 410, e.g., Session 1, Session 2, and Session 3. Session 1 includes multiple turns of query, e.g., a first-turn query 422 "diabetes", a second-turn query 424 "type 1 diabetes", a third-turn query 426 "diabetes symptoms", etc. Session 2 includes multiple turns of query, e.g., a first-turn query 432 "diabetes", a second-turn query 434 "diabetes treatment", a third-turn query 436 "diabetes treatment for female", etc. Session 3 includes multiple turns of query, e.g., a first-turn query 442 "diabetes", a second-turn query 444 "type 1 diabetes symptoms", a third-turn query 446 "type 1 diabetes fatigue", etc.

[0045] In the process 400, for each multi-turn search session, those words shared between every two adjacent queries may be further extracted as a body, and those words not shared between every two adjacent queries may be extracted as candidate options. In an implementation, Longest Common Sequence (LCS) may be used for detecting shared words in two adjacent queries. Taking Session 1 as an example, the query 422 "diabetes" and the query 424 "type 1 diabetes" share a LCS "diabetes", and an entry Bi corresponding to this LCS "diabetes" may be established in a body set B , and other words in the two queries, e.g., "type 1", may be stored as candidate options in a subset 0\ corresponding to B\ in a candidate option set O. Similarly, the query 424 "type 1 diabetes" and the query 426 "diabetes symptoms" share a LCS "diabetes", and there is already an entry Bi corresponding to the LCS "diabetes" in the body set /i, therefore, other words in the two queries, e.g., "type 1" and "symptoms", may be stored as candidate options in the subset 0\ corresponding to Bi, wherein since the candidate option "type 1" already exists in the subset O _l, repeated storage of "type 1" may be avoided. Through performing the body and candidate option extraction for each multi-turn search session described above, body information and candidate option information as shown in the table at the bottom of FIG.4 may be obtained, e.g., the candidate option set ()_\ 'type 1, symptoms, treatment, female} corresponding to the body entry Bi "diabetes", a candidate set (h (symptoms, fatigue} corresponding to a body entry Bi "type 1 diabetes", a candidate option set (h {female} corresponding to a body entry

"diabetes treatment", etc. The information in the table may be indexed, and the table may be further represented as {(Bi, Oi_j)\i = 1, 2,

= 1, 2 , . .,N) , wherein B_t denotes the z-th body, M is the number of bodies, 0 denotes the j- th candidate option in the candidate option set O_i corresponding to the z-th body, and N is the number of candidate options in the candidate option set corresponding to the z-th body.

[0046] Returning to FIG.3, the candidate options extracted at 320 may include, e.g., the candidate options in the candidate option sets O i, (h and (h in FIG.4.

[0047] In some cases, the candidate options extracted at 320 may not be suitable for direct use in forming contrastive word pairs. In one aspect, some candidate options may not belong to the same category or attribute and cannot be used for forming contrastive word pairs. For example, "type 1" and "female" do not belong to the same category or attribute, it will be meaningless to generate a word pair <type 1, female> which is not a contrastive word pair either. In another aspect, some candidate options may be synonyms and should not be used for forming contrastive word pairs. For example, "woman" and "female" are synonyms and have similar meanings, therefore, it should be avoided to form a word pair <woman, female>. Considering the situation described above, the process 300 may also include data optimization for the extracted candidate options.

[0048] At 330, group clustering may be performed to the candidate options. For example, the candidate options extracted at 320 may be clustered into Group 1, Group 2, ..., etc. Each group may include one or more candidate options with the same category or attribute.

[0049] In an implementation, a semi-structured data corpus 332 prepared in advance may be used for performing the group clustering. The semi -structured data corpus 332 may include various types of semi-structured data obtained from the web, e.g., web table, web list, web menu, etc. Usually, candidate options belonging to the same category or attribute will appear together in the same semi -structured data. FIG.5 illustrates exemplary semi -structured data according to an embodiment. A web table 512 is displayed on a web page 510. As highlighted by dashed lines, the web table 512 includes words "Stage 1", "Stage 2", "Stage 3", "Stage 4", etc. belonging to the same category or attribute. Moreover, a web list 522 is displayed on a web page 520. As highlighted by dashed lines, the web list 522 also includes the words "Stage 1", "Stage 2", "Stage 3", "Stage 4", etc. belonging to the same category or attribute.

[0050] The similarity between two candidate options may be calculated based at least on occurrence information of the two candidate options in the semi-structured data corpus 332. For example, given a candidate option pair (o_;, of) and semi -structured data dED, wherein Oi and o, are two candidate options extracted at 320, and D represents a semi- structured data corpus. According to the inclusion relationship between the candidate options and the semi-structured data, i.e., occurrence of the candidate options in the semi- structured data, the following definition may be given:

Equation (2)

wherein X E {1, 0} indicates whether d_t includes O_j , and \D\ is the number of semi- structured data in the corpus D.

[0051] Further, two distributions as following may be defined:

Equation (3) Equation (4)

wherein X e {1, 0} indicates whether d E D includes

and Y e {1, 0} indicates whether d E D includes O_{j .}

[0052] Based on the distributions described above, mutual information in the following equation may be used for calculating a similarity score between the two candidates (o_;, of): Equation (5)

[0053] Given a candidate option set O, groups may be generated through a greedy clustering approach, and a group corresponding to each candidate option may be determined. Firstly, o₁ E O may be selected as a group. Then, o_t E O (i ¹ 1) is traversed, and a similarity score between o_L and each existing group G_j E C is calculated, wherein the calculation equation is, e.g., Equation (6)

wherein G_j is an existing group, \G_j \ is the number of candidate options in the group G_j , and o_{G . n} is the n- th candidate option in the group G_{j .} If the maximum similarity score S(oi, G_j ) < t, wherein t is a predetermined threshold derived from an empirical average value of a group of seed contrastive word pairs, then o_L may be considered as a new group, otherwise, o_L will be added to the group G_j having the maximum similarity score.

[0054] The Table 1 below shows an exemplary process of determining a group corresponding to each candidate option through the greedy clustering approach.

Table 1 [0055] Through the clustering at 330, a group set C = {G₁, G₂, —, G\_C\} may be obtained, wherein |C| is the number of the groups. For example, the groups in the group set C may correspond to Group 1, Group 2, etc., in FIG.3.

[0056] Optionally, at 340, inner-group deduplication may be performed to the groups generated at 330, so as to remove synonymic candidate options. A group including two or more synonymic candidate options may be first identified, and then only one of the two or more synonymic candidate options may be retained in the group. For example, if a group includes the synonyms "woman" and "female", one of these two words, e.g., "woman", may be removed, and only the other word "female" is retained in the group. In an implementation, WordNet may be used for removing synonyms from a group. Through the processing at 340, a group set C' = {G^, G₂ G\_c,\'} without synonyms may be obtained, which may correspond to Group G, Group 2', etc. in FIG.3.

[0057] At 350, any two candidate options in each group may be combined into a contrastive word pair. In the case of having performed the inner-group deduplication at 340, multiple contrastive word pairs 352-1 may be generated based on the candidate options in Group G, multiple contrastive word pairs 352-2 may be generated based on the candidate options in Group 2', etc. All the obtained contrastive word pairs may form a contrastive word pair set 352. For example, any two candidate options in each group G_k E C' may be paired together, so as to obtain a contrastive word pair set PairSet = {(O_j, O_j)\i,j E G_k'). It should be understood that, in the case of having not performing the inner-group deduplication at 340, the contrastive word pair set 352 may be directly generated based on Group 1, Group 2, etc., obtained at 330.

[0058] The contrastive word pair set 352 is an example of the contrastive information extracted from the search log. Therefore, step 320 to step 350 in the process 300 may be regarded as an exemplary implementation of step 230 in FIG.2.

[0059] After the contrastive word pair set 352 is obtained, a positive example 370 from the training data set may be amended with the contrastive word pair set 352. For example, a target word in text 1 of the positive example 370 may be identified at 360, and the target word is also included in a contrastive word pair 360-1 in the contrastive word pair set 352. In an implementation, unigram words, bigram words, trigram words, etc. in text 1 may be traversed, and a target word in text 1 that matches a word in a contrastive word pair in the contrastive word pair set 352 may be found. The other word in this contrastive word pair may be regarded as a contrastive word of the target word in text 1. At 380, the target word in text 1 may be replaced by the contrastive word in the contrastive word pair 360-1, so that text 1 is amended. Accordingly, a negative example 390 that is contrastive to the positive example 370 may be obtained, which includes a text pair that is labelled as irrelevant, and is composed of the amended text 1 and text 2.

[0060] It should be understood that multiple contrastive word pairs may be determined for a target word at 360, thus multiple negative examples may be formed with these contrastive word pairs. Moreover, there may be multiple target words in text 1, thus amendments to text 1 may be performed for these target words respectively, and multiple negative examples may be formed accordingly.

[0061] As an example, assuming that the positive example 370 is a text pair composed of a query "cold treatment for children" and a passage about how to treat children’s cold, through the process 300, it may be determined that the "children" in the query is a target word and is included in a contrastive word pair such as <children, the elderly>, therefore, the "children" in the query may be replaced by "the elderly", and a negative example composed of a query "cold treatment for the elderly" and a passage about how to treat children's cold may be constructed.

[0062] FIG.6 illustrates an exemplary process 600 for providing contrastive training data through a UFM according to an embodiment. The process 600 is an exemplary specific implementation of the process 200 in FIG.2. For example, the process of mining contrastive queries from a search log in FIG.6 may be regarded as a specific implementation of the process of extracting the contrastive information from the search log in FIG.2.

[0063] A search log may not only record search results for a user’s query, but also record user behaviors when the user interacts with a SERP, e.g., clicking on a web link, etc. Two different queries may be associated by click information. Take two queries q = B o o_x and q₂ = B ° o₂ as an example, wherein B denotes words shared by the two queries, o_x and o₂ denote other words in the two queries respectively, and ° denotes a cascade of words. If these two queries share or co-display some web links in their respective SERPs, but the web links clicked by users are significantly different, it is very likely that the words o_x and o₂ are contrastive to each other, and thus q_i and q₂ are also contrastive. For example, assuming that q_i is "how many calories are in cola", a SERP for q- includes web links {URLl, URL2, URL3}, and the clicked web link is URLl, meanwhile, assuming that q₂ is "how much nutrition is in cola", a SERP for q₂ includes web links {URL2, URL3, URL5}, and the clicked web link is URL5. These two queries ask about "calories" in cola and "nutrition" in cola respectively, thus they have different meanings. The web links URL2 and URL3 are shared in the SERPs of these two queries, but users click on different web links URLl and URL5 respectively. This indicates that intents of the two queries are different. Therefore, these two queries are likely to be contrastive to each other.

[0064] The process 600 may be performed for generating a negative example that is contrastive to a positive example 610 in a training data set. At 620, at least one relevant query 624 which is relevant to text 1 in the positive example 610 may be determined from a search log 622. The determined at least one relevant query 624 may be uniformly denoted as a relevant query set Q_r. In the QA scenario, text 1 and text 2 in the positive example 610 may be a query and a passage respectively. In an implementation, a query- based inverted index may be constructed for performing fast query retrieval in the search log, and BM25 may be used for ranking.

[0065] At 630, search records corresponding to the positive example 610 and search records corresponding to each relevant query in Q_r may be extracted from the search log 622. The search records may include, e.g., queries, passages, web page links, click behaviors on web page links, etc.

[0066] At 640, contrastive parameter values between text 1 in the positive example 610 and each relevant query in Q_r may be calculated based at least on the search records corresponding to the positive example 610 and the search records corresponding to each relevant query in Q_r. For example, the number of co-displayed links and the number of co clicked links between text 1 and each relevant query may be determined based on the search records, and contrastive parameter values between text 1 and the relevant query may be calculated based at least on the number of co-displayed links and the number of co-clicked links.

[0067] Text 1 may be denoted as q , and co-displayed link information, co-clicked link information, etc. between q and a relevant query q_r in Q_r may be calculated by the following equations:

CoDisplay(q, q_r ) = U(q) (Ί t/(q_r) Equation (7) UnionDisplay(q, q_r) = U(q) U t/(q_r) Equation (8) CoClick(q, q_r ) = Click(q ) n Click(q_r ) Equation (9) wherein q_r e Q_r, i7 ( ) denotes a link list provided in a SERP for q , t/(q_r) denotes a link list provided in a SERP for q_r, Click(q ) denotes a link list clicked in the SERP for q , Click q_r ) denotes a link list clicked in the SERP for q_r, CoDisplay(q, q_r ) denotes a link list co-displayed in the SERPs for q and q_r , UnionDisplay(q, q_r ) denotes all the link lists displayed in the SERPs for q and q_r , and CoClick(q, q_r ) denotes a link list co-clicked in the SERPs for q and q_r.

[0068] The more links displayed and shared by q and q_r , the more relevant q and q_r might be. However, if the co-clicked links differ a lot, the difference between intents or meanings of q and q_rmay also be significant. Therefore, normalization coefficients may be defined as follows:

Equation (10) Equation (11)

[0069] The normalization coefficients I_c(q, q_r) and l_r(q, q_r ) defined above may be considered as examples of the contrastive parameters. Accordingly, values of the normalization coefficients calculated according to the above equations may act as the contrastive parameter values between q and q_r.

[0070] At 650, at least one contrastive query may be determined from Q_r based on comparison between the calculated contrastive parameter values and predetermined criteria. For example, a relevant query with contrastive parameter values that meet the predetermined criteria may be selected from Q_r as a contrastive query. In an implementation, the final contrastive query set may be denoted as Q_u, and queries to be added to Q_u may be selected through the following equation: Equation (12)

wherein /( ) is a signal function which indicates whether q_r should be added to Q_u corresponding to q, e.g., /( ) = 1 indicates that q_r should be added to ¾, and /( ) = 0 indicates that q_r should not be added to Q_u. t₁ and t₂ are predetermined thresholds, e.g., t₁ may be set to 0, and t₂ may be set to 0.4. The above Equation (12) may be considered as an example of the predetermined criteria. It should be understood that the embodiments of the present disclosure may also adopt any other forms of predetermined criteria.

[0071] The contrastive query is an example of the contrastive information extracted from the search log. Therefore, step 620 to step 650 in the process 600 may be regarded as an exemplary implementation of step 230 in FIG.2.

[0072] After the contrastive query is obtained, the positive example 610 may be amended with the contrastive query. For example, at 660, text 1 in the positive example 610 may be directly replaced by the contrastive query. Accordingly, a negative example 670 that is contrastive to the positive example 610 may be obtained, which includes a text pair that is labelled as irrelevant and is composed of the contrastive query and text 2. It should be understood that if multiple contrastive queries are determined at 650, multiple negative examples may be formed with these contrastive queries, respectively.

[0073] As an example, assuming that the positive example 610 is a text pair composed of a query "cold treatment for children" and a passage about how to treat children’s cold, through the process 600, a contrastive query such as "how to treat cold for the elderly" may be determined, and a negative example composed of the query "how to treat cold for the elderly" and the passage about how to treat children's cold may be constructed.

[0074] FIG.7 illustrates a flowchart of an exemplary method 700 for providing contrastive training data according to an embodiment.

[0075] At 710, a positive example may be obtained from a training data set, the positive example including a first text and a second text labelled as relevant.

[0076] At 720, contrastive information may be extracted from a search log.

[0077] At 730, the first text may be amended based at least on the contrastive information.

[0078] At 740, the amended first text and the second text may be combined into a negative example which is contrastive to the positive example, the amended first text and the second text being labelled as irrelevant in the negative example.

[0079] In an implementation, the extracting contrastive information from a search log may comprise: extracting at least one multi-turn search session from the search log; and generating a contrastive word pair set with queries in the at least one multi-turn search session.

[0080] The at least one multi-turn search session may have the same first-turn query. [0081] The generating a contrastive word pair set may comprise: extracting candidate options from the queries in the at least one multi-turn search session; clustering the candidate options into one or more groups with a semi -structured data corpus; and combining any two candidate options in each group into a contrastive word pair.

[0082] The extracting candidate options may comprise: for each multi-turn search session, extracting words not shared in every two adjacent queries as the candidate options.

[0083] The clustering may comprise: for two target candidate options in the candidate options, calculating similarity between the two target candidate options based at least on occurrence information of the two target candidate options in the semi -structured data corpus.

[0084] The clustering may comprise: determining, through a greedy clustering approach, a group to which each candidate option in the candidate options corresponds. [0085] The semi -structured data in the semi-structured data corpus may belong to at least one type of: web table, web list and web menu.

[0086] The method 700 may further comprise: identifying a group including two or more synonymic candidate options; and retaining, in the group, only one candidate option in the two or more synonymic candidate options.

[0087] The amending the first text may comprise: identifying a target word which is included in the first text and included in a contrastive word pair in the contrastive word pair set; and replacing, in the first text, the target word by another word in the contrastive word pair.

[0088] In an implementation, the extracting contrastive information from a search log may comprise: determining a contrastive query corresponding to the first text from the search log.

[0089] The determining a contrastive query may comprise: determining, from the search log, at least one relevant query which is relevant to the first text; for each relevant query, calculating contrastive parameter values between the first text and the relevant query based at least on a search record corresponding to the positive example and a search record corresponding to the relevant query in the search log; and selecting, from the at least one relevant query, a relevant query which has contrastive parameter values conforming to predetermined criteria as the contrastive query.

[0090] The calculating contrastive parameter values may comprise: determining the number of co-displayed links and the number of co-clicked links between the first text and the relevant query, based on the search record corresponding to the positive example and the search record corresponding to the relevant query; and calculating the contrastive parameter values between the first text and the relevant query based at least on the number of co-displayed links and the number of co-clicked links.

[0091] The amending the first text may comprise: replacing the first text by the contrastive query.

[0092] In an implementation, the training data set may be for training a QA model, the first text corresponding to a query, the second text corresponding to a passage.

[0093] It should be understood that the method 700 may further comprise any step/process for providing contrastive training data according to the embodiments of the present disclosure described above.

[0094] FIG.8 illustrates an exemplary apparatus 800 for providing contrastive training data according to an embodiment.

[0095] The apparatus 800 may comprise: a positive example obtaining module 810, for obtaining a positive example from a training data set, the positive example including a first text and a second text labelled as relevant; a contrastive information extracting module 820, for extracting contrastive information from a search log; a text amending module 830, for amending the first text based at least on the contrastive information; and a negative example generating module 840, for combining the amended first text and the second text into a negative example which is contrastive to the positive example, the amended first text and the second text being labelled as irrelevant in the negative example. [0096] In an implementation, the contrastive information extracting module 820 may be for: extracting at least one multi-turn search session from the search log; and generating a contrastive word pair set with queries in the at least one multi-turn search session. The generating a contrastive word pair set may comprise: extracting candidate options from the queries in the at least one multi-turn search session; clustering the candidate options into one or more groups with a semi -structured data corpus; and combining any two candidate options in each group into a contrastive word pair.

[0097] In an implementation, the contrastive information extracting module 820 may be for: determining a contrastive query corresponding to the first text from the search log. [0098] Moreover, the apparatus 800 may further comprise any other module configured for any operation of providing contrastive training data.

[0099] FIG.9 illustrates an exemplary apparatus 900 for providing contrastive training data according to an embodiment.

[00100] The apparatus 900 may comprise at least one processor 910. The apparatus 900 may further comprise a memory 920 coupled to the processor 910. The memory 920 may store computer-executable instructions that, when executed, cause the processor 910 to: obtain a positive example from a training data set, the positive example including a first text and a second text labelled as relevant; extract contrastive information from a search log; amend the first text based at least on the contrastive information; and combine the amended first text and the second text into a negative example which is contrastive to the positive example, the amended first text and the second text being labelled as irrelevant in the negative example. Moreover, the processor 910 may be further configured for performing any other operations of the methods for providing contrastive training data according to the embodiments of the present disclosure described above.

[00101] The embodiments of the present disclosure may be embodied in a non- transitory computer-readable medium. The non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform any operations of the methods for providing contrastive training data according to the embodiments of the present disclosure described above.

[00102] It should be understood that all the operations in the methods described above are merely exemplary, and the present disclosure is not limited to any operations in the methods or sequence orders of these operations, and should cover all other equivalents under the same or similar concepts.

[00103] It should also be understood that all the modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.

[00104] Processors are described in connection with various apparatus and methods. These processors may be implemented using electronic hardware, computer software, or any combination thereof. Whether these processors are implemented as hardware or software will depend on the specific application and the overall design constraints imposed on the system. By way of example, a processor, any portion of a processor, or any combination of processors presented in this disclosure may be implemented as a microprocessor, a micro-controller, a digital signal processor (DSP), a field programmable gate array (FPGA) , a programmable logic device (PLD), state machine, gate logic, discrete hardware circuitry, and other suitable processing components configured to perform the various functions described in this disclosure. The functions of a processor, any portion of a processor, or any combination of processors presented in this disclosure may be implemented as software executed by a microprocessor, a micro-controller, a DSP, or other suitable platforms.

[00105] Software should be considered broadly to represent instructions, instruction sets, code, code segments, program code, programs, subroutines, software modules, applications, software applications, software packages, routines, subroutines, objects, running threads, processes, functions, etc. Software may reside on computer readable medium. Computer readable medium may include, e.g., a memory, which may be, e.g., a magnetic storage device (e.g., a hard disk, a floppy disk, a magnetic strip), an optical disk, a smart card, a flash memory device, a random access memory (RAM), a read only memory (ROM), a programmable ROM (PROM), an erasable PROM (EPROM), an electrically erasable PROM (EEPROM), a register, or a removable disk. Although a memory is shown as being separate from the processor in various aspects presented in this disclosure, a memory may also be internal to the processor (e.g., a cache or a register).

[00106] The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein. All structural and functional equivalents to the elements of the various aspects described throughout the present disclosure that are known or later come to be known to those of ordinary skilled in the art are intended to be encompassed by the claims.

Claims

1. A method for providing contrastive training data, comprising: obtaining a positive example from a training data set, the positive example including a first text and a second text labelled as relevant; extracting contrastive information from a search log; amending the first text based at least on the contrastive information; and combining the amended first text and the second text into a negative example which is contrastive to the positive example, the amended first text and the second text being labelled as irrelevant in the negative example.

2. The method of claim 1, wherein the extracting contrastive information from a search log comprises: extracting at least one multi-turn search session from the search log; and generating a contrastive word pair set with queries in the at least one multi-turn search session.

3. The method of claim 2, wherein the at least one multi-turn search session has the same first-turn query.

4. The method of claim 2, wherein the generating a contrastive word pair set comprises: extracting candidate options from the queries in the at least one multi-turn search session; clustering the candidate options into one or more groups with a semi-structured data corpus; and combining any two candidate options in each group into a contrastive word pair.

5. The method of claim 4, wherein the extracting candidate options comprises: for each multi -turn search session, extracting words not shared in every two adjacent queries as the candidate options.

6. The method of claim 4, wherein the clustering comprises: for two target candidate options in the candidate options, calculating similarity between the two target candidate options based at least on occurrence information of the two target candidate options in the semi-structured data corpus.

7. The method of claim 4, wherein the clustering comprises: determining, through a greedy clustering approach, a group to which each candidate option in the candidate options corresponds.

8. The method of claim 4, further comprising: identifying a group including two or more synonymic candidate options; and retaining, in the group, only one candidate option in the two or more synonymic candidate options.

9. The method of claim 2, wherein the amending the first text comprises: identifying a target word which is included in the first text and included in a contrastive word pair in the contrastive word pair set; and replacing, in the first text, the target word by another word in the contrastive word pair.

10. The method of claim 1, wherein the extracting contrastive information from a search log comprises: determining a contrastive query corresponding to the first text from the search log.

11. The method of claim 10, wherein the determining a contrastive query comprises: determining, from the search log, at least one relevant query which is relevant to the first text; for each relevant query, calculating contrastive parameter values between the first text and the relevant query based at least on a search record corresponding to the positive example and a search record corresponding to the relevant query in the search log; and selecting, from the at least one relevant query, a relevant query which has contrastive parameter values conforming to predetermined criteria as the contrastive query.

12. The method of claim 11, wherein the calculating contrastive parameter values comprises: determining the number of co-displayed links and the number of co-clicked links between the first text and the relevant query, based on the search record corresponding to the positive example and the search record corresponding to the relevant query; and calculating the contrastive parameter values between the first text and the relevant query based at least on the number of co-displayed links and the number of co-clicked links.

13. The method of claim 10, wherein the amending the first text comprises: replacing the first text by the contrastive query.

14. An apparatus for providing contrastive training data, comprising: a positive example obtaining module, for obtaining a positive example from a training data set, the positive example comprising a first text and a second text labelled as relevant; a contrastive information extracting module, for extracting contrastive information from a search log; a text amending module, for amending the first text based at least on the contrastive information; and a negative example generating module, for combining the amended first text and the second text into a negative example which is contrastive to the positive example, the amended first text and the second text being labelled as irrelevant in the negative example.

15. An apparatus for providing contrastive training data, comprising: at least one processor; and a memory storing computer-executable instructions that, when executed, cause the at least one processor to: obtain a positive example from a training data set, the positive example comprising a first text and a second text labelled as relevant, extract contrastive information from a search log, amend the first text based at least on the contrastive information, and combine the amended first text and the second text into a negative example which is contrastive to the positive example, the amended first text and the second text being labelled as irrelevant in the negative example.