CN113723115B

CN113723115B - Open domain question-answer prediction method based on pre-training model and related equipment

Info

Publication number: CN113723115B
Application number: CN202111167748.7A
Authority: CN
Inventors: 成杰峰; 彭奕
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2024-02-09
Anticipated expiration: 2041-09-30
Also published as: CN113723115A

Abstract

The application relates to the technical field of artificial intelligence, and particularly provides an open domain question-answer prediction method and related equipment based on a pre-training model, wherein the method comprises the following steps: encoding the query statement to obtain a query vector; matching the query vector with at least one fragment cluster to determine a target fragment cluster to which the query statement belongs; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating posterior probability of the updated query statement and the fragments in the target fragment cluster; repeatedly executing the operations of selecting at least one fragment according to the posterior probability and obtaining updated query sentences according to the at least one fragment until no fragment directly connected with the currently selected at least one fragment exists in the target fragment cluster; and calculating posterior probability of the latest query statement and fragments in the target fragment cluster, and returning a question-answer result according to the posterior probability. The embodiment of the application is beneficial to improving the prediction efficiency in open domain questions and answers.

Description

Open domain question-answer prediction method based on pre-training model and related equipment

Technical Field

The application relates to the technical field of intelligent questions and answers, in particular to an open domain question and answer prediction method based on a pre-training model and related equipment.

Background

With the development of the internet, the business volume of each industry is rapidly increased, and the customer scale gradually shifts from offline to online, and the number of artificial customer services and the processing efficiency of each enterprise are far from the acceleration of online customers, so that various intelligent question-answering systems are urgently needed to alleviate the phenomenon. The existing intelligent question system is mostly based on a closed domain, namely, a knowledge base of questions and answers is limited in a specific domain, such as banks, insurance, questions and answers, and under the drive of customer demands, researchers propose an open-domain question and answer technology (open-domain QA), which is not limited to questions and answers in a certain domain, but learns knowledge based on massive text documents (such as knowledge bases of wikipedia and the like) in various industries, so that questions in any domain can be answered. In the current open domain question-answering system, posterior probability is calculated on query sentences and massive fragments one by one, and fragments with high probability are extracted.

Disclosure of Invention

Aiming at the problems, the application provides an open domain question-answering prediction method and related equipment based on a pre-training model, which are beneficial to improving the prediction efficiency in the open domain question-answering.

To achieve the above object, a first aspect of an embodiment of the present application provides an open domain question-answer prediction method based on a pre-training model, where the method includes:

encoding an input query sentence by adopting a pre-training model to obtain a query vector of the query sentence;

matching the query vector with at least one fragment cluster in a pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster;

selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster;

repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster;

And calculating the target posterior probability of the second segment in the current updated query sentence and target segment cluster, and returning an open domain question-answering result of the query sentence according to the target posterior probability.

With reference to the first aspect, in one possible implementation manner, at least one fragment cluster is obtained by clustering fragment data in each field, and before the input query sentence is encoded by adopting the pre-training model, the method further includes:

determining a radius and a neighborhood density threshold value adopted by clustering the segment data in each field in a clustering algorithm;

and constructing a cluster map based on the radius and the neighborhood density threshold.

With reference to the first aspect, in one possible implementation manner, determining a radius adopted by clustering segment data of each domain in a clustering algorithm includes:

the segment data in each field is encoded by adopting a pre-training model, so as to obtain at least one semantic vector;

carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points;

calculating the average distance between the points of the first target number of points;

repeatedly executing K times to perform logarithmic sampling on at least one semantic vector and calculating the average distance between points to obtain K average distances between points, wherein K is an integer greater than 1;

The average value of the average distance between K points is taken as the radius.

With reference to the first aspect, in one possible implementation manner, determining a neighborhood density threshold value adopted for clustering segment data of each domain in a clustering algorithm includes:

carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points;

randomly selecting one point from the second target number of points as a clustering center, and calculating the number of similar points of the clustering center according to the radius and a predefined discriminant function;

repeatedly executing operations of carrying out logarithmic sampling on at least one semantic vector for K times, randomly selecting one point as a clustering center and calculating the number of similar points of the clustering center to obtain K number values;

the average value of the K number values is taken as a neighborhood density threshold value.

With reference to the first aspect, in a possible implementation manner, constructing the cluster map based on the radius and the neighborhood density threshold includes:

starting from any semantic vector in at least one semantic vector, acquiring the number of neighborhood points of the any semantic vector according to the radius, determining the any semantic vector as a boundary point if the number of the neighborhood points is smaller than a neighborhood density threshold value, and determining the any semantic vector as a core point if the number of the neighborhood points is larger than or equal to the neighborhood density threshold value;

If any semantic vector is a core point, determining a point with reachable any semantic vector density and any semantic vector density as a fragment cluster, and if any semantic vector is a boundary point, adding any semantic vector into the fragment cluster to which the core point with reachable any semantic vector density belongs until the core point in at least one semantic vector is clustered to obtain at least one fragment cluster;

and giving an edge to the neighborhood point in each segment cluster in at least one segment cluster to obtain a cluster map.

With reference to the first aspect, in one possible implementation manner, matching the query vector with at least one segment cluster in the pre-constructed cluster map to determine a target segment cluster to which the query statement belongs from the at least one segment cluster includes:

calculating an average value of core points in each segment cluster for each segment cluster of the at least one segment cluster;

taking the average value of core points in each segment cluster as the clustering center of each segment cluster;

calculating the target distance between the query vector and the clustering center of each fragment cluster;

and determining the fragment cluster represented by the cluster center with the smallest target distance in at least one fragment cluster as a target fragment cluster.

With reference to the first aspect, in one possible implementation manner, the encoding the input query statement using the pre-training model to obtain a query vector of the query statement includes:

preprocessing the query sentence to obtain a word vector of the query sentence;

obtaining a query matrix, a key matrix and a value matrix based on word vector calculation;

calculating attention weights based on the query matrix, the key matrix and the value matrix;

the attention weight is multiplied by the value matrix to obtain an attention vector, and the attention vector is encoded to obtain a query vector.

A second aspect of the embodiments of the present application provides an open domain question-answer prediction apparatus based on a pre-training model, where the apparatus includes:

the coding unit is used for coding the input query statement by adopting the pre-training model to obtain a query vector of the query statement;

the matching unit is used for matching the query vector with at least one fragment cluster in the pre-constructed cluster map so as to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster;

the updating unit is used for selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster;

The updating unit is further used for repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster;

the prediction unit is used for calculating the target posterior probability of the second segment in the current updated query statement and the target segment cluster, and returning an open domain question-answer result of the query statement according to the target posterior probability.

A third aspect of the embodiments of the present application provides an electronic device, including an input device and an output device, and further including a processor adapted to implement one or more instructions; and a computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the steps of:

A fourth aspect of the present embodiments provides a computer storage medium storing one or more instructions adapted to be loaded by a processor and to perform the steps of:

The scheme of the application at least comprises the following beneficial effects:

in the embodiment of the application, the input query statement is encoded by adopting a pre-training model to obtain the query vector of the query statement; matching the query vector with at least one fragment cluster in a pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster; repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster; and calculating the target posterior probability of the second segment in the current updated query sentence and target segment cluster, and returning an open domain question-answering result of the query sentence according to the target posterior probability. When a query statement is input, a target fragment cluster is selected from at least one fragment cluster, the target fragment cluster is used as a database, at least one fragment is screened from layers in the target fragment cluster, and the fragment with the maximum target posterior probability and the fragment which is related to the fragment with the maximum target posterior probability in each layer are returned as open domain question-answering results.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of an application environment provided in an embodiment of the present application;

fig. 2 is a flow chart of an open domain question-answer prediction method based on a pre-training model according to an embodiment of the present application;

fig. 3 is a schematic diagram of generating a cluster map according to an embodiment of the present application;

FIG. 4 is a flowchart of another open domain question-answer prediction method based on a pre-training model according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an open domain question-answer prediction device based on a pre-training model according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

The terms "comprising" and "having" and any variations thereof, as used in the specification, claims and drawings, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. Furthermore, the terms "first," "second," and "third," etc. are used for distinguishing between different objects and not for describing a particular sequential order.

The embodiment of the application provides an open domain question-answer prediction method based on a pre-training model, which can be implemented based on an application environment shown in fig. 1, please refer to fig. 1, wherein the application environment comprises an electronic device and a user device connected with the electronic device through a network. The user equipment is provided with an input interface for receiving query sentences input by the user, such as the query sentences of the user on the commodity details, and is also provided with a communication interface for transmitting the query sentences to the electronic equipment. The electronic equipment receives the query statement through the communication interface of the electronic equipment and transmits the query statement to the processor so that the processor executes the open domain question-answer prediction method based on the pre-training model. Because the electronic equipment reduces the query range to the target fragment cluster without querying in each fragment cluster, the query calculation amount is greatly reduced, and the prediction efficiency in open domain question and answer is improved.

The electronic device may be a stand-alone server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms. Any of the at least one terminal may be a smart phone, a computer, a wearable device, an in-vehicle device, etc.

Based on the application environment shown in fig. 1, the open domain question-answer prediction method based on the pre-training model provided in the embodiment of the present application is described in detail below in conjunction with other drawings.

Referring to fig. 2, fig. 2 is a flowchart of an open domain question-answer prediction method based on a pre-training model according to an embodiment of the present application, where the method is applied to an electronic device, as shown in fig. 2, and includes steps 201-205:

201: and encoding the input query statement by adopting a pre-training model to obtain a query vector of the query statement.

In a specific embodiment of the present application, the pre-training model may use a BERT (Bidirectional Encoder Representations from Transformers, bi-directional coding representation based on convertors) model, where the BERT model uses data of each neighborhood in advance for training and fine tuning, so that the model can learn deep information of the data of each neighborhood. Illustratively, the pre-training model is adopted to encode the input query sentence to obtain the query vector of the query sentence, which comprises the following steps:

Preprocessing the query sentence to obtain a word vector of the query sentence;

It should be appreciated that the BERT model is encoded by a transform encoder, the underlying encoder first pre-processes the input Query statement (Query) to obtain corresponding word vectors, such as word embedding or single-hot encoding, and the like, the self-attention layer of the transform encoder constructs corresponding Query vectors q, key vectors k, and value vectors v based on the word vectors, and the Query vectors q, key vectors k, and value vectors v are respectively combined with the pre-trained Query weight matrixKey weight matrix->Sum weight matrix->Multiplying to obtain a query matrix Q, a key matrix K and a value matrix V, and then calculating attention weight based on the query matrix, the key matrix and the value matrix>Finally, attention weight +.>Multiplying the value matrix V to obtain an output attention vector of the self-attention layer, and encoding the attention vector through a feedforward neural network to obtain a query vector.

202: matching the query vector with at least one fragment cluster in the pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster.

In a specific embodiment of the present application, at least one fragment cluster is obtained by clustering fragment data in each field, and before an input query sentence is encoded by adopting a pre-training model, the method further includes:

Illustratively, determining a radius used for clustering segment data of each domain in a clustering algorithm includes:

In particular, it is assumed that the number of at least one semantic vector isThe target number obtained by each logarithmic sampling isI.e. the first target number is +.>For the->Any two of the dots +.>And->Using Euclidean distance as its distance measure, the +.>The average distance between the points of the points is expressed as follows:

；

wherein,representation->Average distance between points of individual points +.>Representation dot->And (4) point->Is a euclidean distance of (c).

For at least one semantic vector, in order to avoid sampling imbalance, repeatedly performing operations of logarithmic sampling on at least one semantic vector and calculating average distances among points for K times to obtain K average distances among points, and calculating an average value of the K average distances among points as a radius Eps of a neighborhood, wherein the formula is as follows:

；

wherein,，/>，…，/>the average distance between the 1 st points, the average distance between the 2 nd points, …, the average distance between the K-th points, and Eps represent the final neighbor radius. In this embodiment, sampling is selectedThe number of points is that the number of samples in the open domain scene is very large, usually tens of millions or more, and if all the distance calculation is performed, the calculation amount is very large ( >Magnitude), therefore, taking the logarithmic sample can significantly reduce the amount of computation, and at the same time, taking the logarithmic sample K times can solve the problem of sampling imbalance.

Illustratively, determining a neighborhood density threshold value used for clustering segment data of each domain in a clustering algorithm includes:

Specifically, in order to reduce the calculation amount, when determining the neighborhood density threshold value, at least one semantic vector is also subjected to logarithmic sampling to obtainA point, i.e. the second target number is +.>Then from this->One point is randomly selected from the points>As a cluster center, the number of points belonging to the same category as the cluster center is then calculated based on the previously determined radius parameter Eps and a predefined discriminant function defined as:

；

Wherein,representing the ratio of the two-point spacing to Eps, the discriminant function +.>The representation is: for a point, if the distance between the point nearby and the point is smaller than Eps, the point is the same kind of point. The number of its homologous pointsThe calculation formula of (2) is as follows:

；

wherein,representing the cluster center in a single computation->The number of homologous points of ∈ ->Representation dot->And cluster center->Is a euclidean distance of (c). Similar to the radius parameter Eps, in order to avoid the problem of unbalanced sampling, operations of logarithmic sampling, selecting a cluster center and calculating the number of similar points of the cluster center are repeated for K times to obtain K number values, and calculating an average value of the K number values as a neighborhood density threshold value Minpts, wherein the formula is as follows:

；

wherein,，/>，…，/>the 1 st number value, the 2 nd number value, …, the K number value are respectively represented. In this embodiment, similar to the radius parameter Eps, the neighborhood density threshold mints is determined in an adaptive manner, and in the clustering, the radius Eps and the neighborhood density threshold mints need to be determined in advanceThe values tend to bring distinct clustering results according to different selections, so that the accuracy of the final returned result is affected, therefore, the selection of parameters Eps and Minpts is very important, if the parameters Eps and Minpts are selected by using a rule of thumb, larger instability is often caused, and the stability of the clustering clusters can be increased by adopting a self-adaptive parameter selection method according to the segment data prepared in advance, so that the fluctuation of the question-answer result is obviously reduced.

Illustratively, constructing the cluster map based on the radius and the neighborhood density threshold includes:

Specifically, each piece of fragment data corresponds to a semantic vector, the semantic vector is expressed as each point in a high-dimensional space, any semantic vector is expressed as a point p, the number of neighborhood points of the point p is determined according to a preset radius Eps, if the number of the neighborhood points of the point p is smaller than a neighborhood density threshold value mps, the point p is a boundary point, if the number of the neighborhood points of the point p is greater than or equal to the neighborhood density threshold value mps, the point p is a core point, as shown in fig. 3, the neighborhood density threshold value mps is assumed to be 3, 3 points exist in the neighborhood of the point p, the point p is a core point, only two points exist in the neighborhood of the point q, and the point q is a boundary point. If the point p is a core point, a segment cluster can be determined, points with reachable point p density belong to the segment cluster, if the point p is a boundary point, the point p can be divided into segment clusters with reachable core points, the segment clusters with all core points belong to are determined according to the method, at least one segment cluster is obtained, for each segment cluster in at least one segment cluster, an edge is assigned to a neighborhood point in each segment cluster, for example, in fig. 3, the point p is in the neighborhood of the point q, an edge is assigned to the point p and the point q, the point p is in the neighborhood of the point s, an edge is assigned to the point p and the point s, a graph corresponding to each segment cluster is obtained, all graphs form the cluster graph, and the cluster graph is stored for subsequent matching.

Illustratively, matching the query vector with at least one fragment cluster in the pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster, including:

And if the target distance between the query vector and the clustering center is the smallest, namely the closest distance between the clustering center and the query vector is indicated, the query sentence belongs to the category represented by the clustering center.

203: at least one segment is selected from the target segment cluster, an updated query statement is obtained according to the at least one segment, and the posterior probability of the updated query statement and the first segment in the target segment cluster is calculated.

In a specific embodiment of the present application, a target segment cluster is used for querying a database, a posterior probability of each segment in the Query vector and the target segment cluster is calculated, segments in the target segment cluster are ordered according to the posterior probability, at least one segment with a posterior probability greater than or equal to a preset value is selected, for example, at least one segment is P1, P2 and P3, respectively, P1, P2 and P3 are combined with Query sentences to obtain updated Query sentences, for example, query sentences P1 are formed with P1, the updated Query sentences P1 are used as new inputs, and the posterior probability of the updated Query sentences P1 and first segments in the target segment cluster is calculated, wherein the first segments refer to segments except P1 in the target segment cluster.

204: and repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster.

In this embodiment of the present application, according to the posterior probability obtained by the previous calculation, the segments except for P1 in the target segment cluster are ordered, at least one segment with the posterior probability greater than or equal to a preset value is selected, for example, at least one segment is P11, P12, and P13, respectively, P11, P12, and P13 and the last input Query P1 form an updated Query statement, for example, a current updated Query statement Query P1P 12 is formed with P12, and the above operations are repeated until at least one segment currently selected in each path of P11, P12, and P13 does not have a segment directly connected to the target segment cluster, that is, by analysis of a cluster map, the target segment cluster does not have a segment having a correlation with the at least one currently selected segment.

205: and calculating the target posterior probability of the second segment in the current updated query sentence and target segment cluster, and returning an open domain question-answering result of the query sentence according to the target posterior probability.

In a specific embodiment of the present application, assuming that after a current updated Query statement Query P1P 12 is formed, a segment directly connected to P12 does not exist in a target segment cluster, updating input is stopped, and a target posterior probability of a second segment in the current updated Query statement Query P1P 12 and the target segment cluster is calculated, where the second segment refers to a segment in the target segment cluster except for P12, the second segment is ordered according to the target posterior probability, a segment with the maximum target posterior probability, such as P115, is selected, and P115, P12 and P1 are used as an open domain question-answer result of the Query statement, and then the open domain question-answer result is returned to a user. Of course, the above is an example, in the actual scenario, there is also an updated query sentence composed of P2 and P3, and the target posterior probability is the largest posterior probability in all the current updated query sentences. The number of at least one segment selected at a time may be the same or different according to the requirement of the correlation between segments, for example, the value of the calculated posterior probability may not be high as a whole when the input is updated backward, so the number of at least one segment may be in a decreasing trend to reduce the calculation amount.

It can be seen that in the embodiment of the present application, by encoding an input query sentence by using a pre-training model, a query vector of the query sentence is obtained; matching the query vector with at least one fragment cluster in a pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster; selecting at least one segment from the target segment cluster, obtaining an updated query statement according to the at least one segment, and calculating the posterior probability of the updated query statement and the first segment in the target segment cluster; repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining a current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster; and calculating the target posterior probability of the second segment in the current updated query sentence and the target segment cluster, and returning an open domain question-answer result of the query sentence according to the target posterior probability. When a query statement is input, a target fragment cluster is selected from at least one fragment cluster, the target fragment cluster is used as a database, at least one fragment is screened from layers in the target fragment cluster, and the fragment with the maximum target posterior probability and the fragment which is related to the fragment with the maximum target posterior probability in each layer are returned as open domain question-answering results.

Referring to fig. 4, a flowchart of another open domain question-answer prediction method based on a pre-training model provided in the embodiment of the present application in fig. 4 is shown in fig. 4, and includes steps 401-410:

401: the segment data in each field is encoded by adopting a pre-training model, so as to obtain at least one semantic vector;

402: carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points, and calculating the average distance between the points of the first target number of points;

403: repeatedly executing the operations of carrying out logarithmic sampling on at least one semantic vector and calculating the average distance between the points for K times to obtain K average distances between the points, and taking the average value of the K average distances between the points as the radius of the cluster;

404: carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points, randomly selecting one point from the second target number of points as a clustering center, and calculating the number of similar points of the clustering center according to the radius and a predefined discriminant function;

405: repeatedly executing operations of carrying out logarithmic sampling on at least one semantic vector for K times, randomly selecting a point as a clustering center and calculating the number of similar points of the clustering center to obtain K number values, and taking the average value of the K number values as a neighborhood density threshold value of the clustering;

406: encoding an input query sentence by adopting a pre-training model to obtain a query vector of the query sentence;

407: matching the query vector with at least one fragment cluster in a pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster; the at least one fragment cluster is obtained by clustering based on a radius and a neighborhood density threshold;

408: selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster;

409: repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster;

410: and calculating the target posterior probability of the second segment in the current updated query sentence and target segment cluster, and returning an open domain question-answering result of the query sentence according to the target posterior probability.

The specific implementation of steps 401-410 is described in the embodiment shown in fig. 2, and the same or similar advantages can be achieved, so that repetition is avoided and detailed description is omitted here.

For a description of the embodiment of the open domain question-answer prediction method based on the pre-training model, please refer to fig. 5, fig. 5 is a schematic structural diagram of an open domain question-answer prediction device based on the pre-training model provided in the embodiment of the present application, as shown in fig. 5, the device includes:

an encoding unit 501, configured to encode an input query sentence by using a pre-training model, to obtain a query vector of the query sentence;

the matching unit 502 is configured to match the query vector with at least one segment cluster in the pre-constructed cluster map, so as to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;

an updating unit 503, configured to select at least one segment from the target segment cluster, obtain an updated query statement according to the at least one segment, and calculate a posterior probability of the updated query statement and a first segment in the target segment cluster;

the updating unit 503 is further configured to repeatedly perform an operation of selecting at least one segment from the target segment cluster according to the posterior probability obtained last time, and obtaining a current updated query statement according to the at least one segment, until no segment directly connected to the at least one currently selected segment exists in the target segment cluster;

And a prediction unit 504, configured to calculate a target posterior probability of the second segment in the current updated query sentence and the target segment cluster, and return an open domain question-answer result of the query sentence according to the target posterior probability.

It can be seen that, in the open domain question-answer prediction device based on the pre-training model shown in fig. 5, the pre-training model is adopted to encode the input query sentence, so as to obtain the query vector of the query sentence; matching the query vector with at least one fragment cluster in a pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster; repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster; and calculating the target posterior probability of the second segment in the current updated query sentence and target segment cluster, and returning an open domain question-answering result of the query sentence according to the target posterior probability. When a query statement is input, a target fragment cluster is selected from at least one fragment cluster, the target fragment cluster is used as a database, at least one fragment is screened from layers in the target fragment cluster, and the fragment with the maximum target posterior probability and the fragment which is related to the fragment with the maximum target posterior probability in each layer are returned as open domain question-answering results.

In a possible implementation manner, at least one fragment cluster is obtained by clustering fragment data in each field, and the encoding unit 501 is further configured to:

In one possible implementation, the encoding unit 501 is specifically configured to, in determining a radius used for clustering segment data of each domain in the clustering algorithm:

In one possible implementation, in determining a neighborhood density threshold value used for clustering segment data of each domain in the clustering algorithm, the encoding unit 501 is specifically configured to:

In a possible implementation, the coding unit 501 is specifically configured to, in constructing the cluster map based on a radius and a neighborhood density threshold:

In one possible implementation manner, in matching the query vector with at least one segment cluster in the pre-constructed cluster map to determine, from the at least one segment cluster, a target segment cluster to which the query statement belongs, the matching unit 502 is specifically configured to:

In one possible implementation, in encoding an input query term using a pre-training model, the encoding unit 501 is specifically configured to:

preprocessing the query sentence to obtain a word vector of the query sentence;

According to one embodiment of the present application, each unit of the open domain question-answer prediction apparatus based on the pre-training model shown in fig. 5 may be separately or completely combined into one or several additional units, or some unit(s) thereof may be further split into a plurality of units with smaller functions to form the same operation, which may not affect the implementation of the technical effects of the embodiments of the present application. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present application, the open-domain question-answer prediction apparatus based on the pre-training model may also include other units, and in practical applications, these functions may also be implemented with assistance of other units, and may be implemented by cooperation of multiple units.

According to another embodiment of the present application, the open-domain question-answer prediction apparatus device based on a pre-training model as shown in fig. 5 may be constructed by running a computer program (including a program code) capable of executing the steps involved in the respective methods as shown in fig. 2 or fig. 4 on a general-purpose computing device such as a computer including a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), etc., processing elements and storage elements, and implementing the open-domain question-answer prediction method based on a pre-training model of the embodiments of the present application. The computer program may be recorded on, for example, a computer-readable recording medium, and loaded into and executed by the above-described computing device via the computer-readable recording medium.

Based on the description of the method embodiment and the device embodiment, the embodiment of the application also provides electronic equipment. Referring to fig. 6, the electronic device includes at least a processor 601, an input device 602, an output device 603, and a computer storage medium 604. Wherein the processor 601, input device 602, output device 603, and computer storage medium 604 within the electronic device may be connected by a bus or other means.

The computer storage medium 604 may be stored in a memory of an electronic device, the computer storage medium 604 being for storing a computer program comprising program instructions, the processor 601 being for executing the program instructions stored by the computer storage medium 604. The processor 601 (or CPU (Central Processing Unit, central processing unit)) is a computing core as well as a control core of the electronic device, which is adapted to implement one or more instructions, in particular to load and execute one or more instructions to implement a corresponding method flow or a corresponding function.

In one embodiment, the processor 601 of the electronic device provided in the embodiments of the present application may be configured to perform a series of open-domain question-answer prediction processes based on a pre-trained model:

It can be seen that, in the electronic device shown in fig. 6, the query vector of the query statement is obtained by encoding the input query statement by using the pre-training model; matching the query vector with at least one fragment cluster in a pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster; repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster; and calculating the target posterior probability of the second segment in the current updated query sentence and target segment cluster, and returning an open domain question-answering result of the query sentence according to the target posterior probability. When a query statement is input, a target fragment cluster is selected from at least one fragment cluster, the target fragment cluster is used as a database, at least one fragment is screened from layers in the target fragment cluster, and the fragment with the maximum target posterior probability and the fragment which is related to the fragment with the maximum target posterior probability in each layer are returned as open domain question-answering results.

In yet another embodiment, the at least one segment cluster is obtained by clustering segment data of each domain, and the processor 601 is further configured to perform, before encoding the input query term using the pre-training model to obtain a query vector of the query term:

In yet another embodiment, the processor 601 performs determining a radius employed in a clustering algorithm to cluster segment data for each domain, including:

In yet another embodiment, the processor 601 performs determining a neighborhood density threshold for clustering segment data for each domain in a clustering algorithm, comprising:

In yet another embodiment, processor 601 performs constructing the cluster map based on radius and neighborhood density thresholds, comprising:

In yet another embodiment, the processor 601 performs matching of the query vector with at least one segment cluster in the pre-constructed cluster map to determine a target segment cluster to which the query statement belongs from the at least one segment cluster, including:

In yet another embodiment, the processor 601 performs encoding of an input query statement using a pre-training model to obtain a query vector of the query statement, comprising:

preprocessing the query sentence to obtain a word vector of the query sentence;

By way of example, electronic devices include, but are not limited to, a processor 601, an input device 602, an output device 603, and a computer storage medium 604. And may also include memory, power supplies, application client modules, and the like. The input device 602 may be a keyboard, touch screen, radio frequency receiver, etc., and the output device 603 may be a speaker, display, radio frequency transmitter, etc. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of an electronic device and is not limiting of an electronic device, and may include more or fewer components than shown, or certain components may be combined, or different components.

It should be noted that, since the steps in the above-mentioned open-domain question-answer prediction method based on the pre-training model are implemented when the processor 601 of the electronic device executes the computer program, the embodiments of the above-mentioned open-domain question-answer prediction method based on the pre-training model are applicable to the electronic device, and the same or similar beneficial effects can be achieved.

The embodiment of the application also provides a computer storage medium (Memory), which is a Memory device in the electronic device and is used for storing programs and data. It will be appreciated that the computer storage medium herein may include both a built-in storage medium in the terminal and an extended storage medium supported by the terminal. The computer storage medium provides a storage space that stores an operating system of the terminal. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor 601. The computer storage medium herein may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory; alternatively, it may be at least one computer storage medium located remotely from the processor 601. In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by processor 601 to implement the corresponding steps described above with respect to the pre-trained model-based open-domain question-answer prediction method.

The computer program of the computer storage medium may illustratively include computer program code, which may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.

It should be noted that, since the steps in the open-domain question-answer prediction method based on the pre-training model are implemented when the computer program of the computer storage medium is executed by the processor, all embodiments of the open-domain question-answer prediction method based on the pre-training model are applicable to the computer storage medium, and the same or similar beneficial effects can be achieved.

The foregoing has outlined rather broadly the more detailed description of embodiments of the present application, wherein specific examples are provided herein to illustrate the principles and embodiments of the present application, the above examples being provided solely to assist in the understanding of the methods of the present application and the core ideas thereof; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. An open domain question-answer prediction method based on a pre-training model, which is characterized by comprising the following steps:

selecting at least one segment from the target segment cluster, obtaining an updated query statement according to the at least one segment, and calculating the posterior probability of the updated query statement and the first segment in the target segment cluster; the first fragment refers to fragments except at least one selected fragment in the target fragment cluster;

the selecting at least one segment from the target segment cluster, and obtaining an updated query statement according to the at least one segment includes:

calculating posterior probability of each segment in the query vector and the target segment cluster; selecting at least one segment with a posterior probability greater than or equal to a preset value from the target segment cluster; combining the selected at least one segment with the query statement to obtain an updated query statement;

Repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining a current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster;

the repeatedly executing the operations of selecting at least one segment from the target segment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one segment comprises the following steps:

selecting at least one segment with the posterior probability greater than or equal to a preset value from the first segment according to the posterior probability obtained by the last calculation; combining the selected at least one fragment with the last updated query sentence to obtain a current updated query sentence;

calculating the target posterior probability of the second segment in the current updated query sentence and the target segment cluster, and returning an open domain question-answer result of the query sentence according to the target posterior probability; the second segment refers to the segments of the target segment cluster except for the segments currently combined with the query statement updated last time.

2. The method of claim 1, wherein the at least one fragment cluster is obtained by clustering fragment data of each domain, and wherein prior to encoding an input query term using a pre-training model to obtain a query vector for the query term, the method further comprises:

and constructing the cluster map based on the radius and the neighborhood density threshold.

3. The method according to claim 2, wherein determining the radius used for clustering segment data of each domain in the clustering algorithm comprises:

the pre-training model is adopted to encode fragment data in each field to obtain at least one semantic vector;

carrying out logarithmic sampling on the at least one semantic vector to obtain a first target number of points;

repeatedly executing K times to perform logarithmic sampling on the at least one semantic vector and calculating the average distance between points to obtain K average distances between points, wherein K is an integer greater than 1;

And taking the average value of the average distances among the K points as the radius.

4. A method according to claim 3, wherein determining a neighborhood density threshold for clustering segment data for each domain in the clustering algorithm comprises:

carrying out logarithmic sampling on the at least one semantic vector to obtain a second target number of points;

repeatedly executing operations of carrying out logarithmic sampling on the at least one semantic vector for K times, randomly selecting a point as a clustering center and calculating the number of similar points of the clustering center to obtain K number values;

and taking the average value of the K number values as the neighborhood density threshold value.

5. A method according to claim 3, wherein said constructing said cluster map based on said radius and said neighborhood density threshold comprises:

starting from any semantic vector in the at least one semantic vector, acquiring the number of neighborhood points of the any semantic vector according to the radius, determining the any semantic vector as a boundary point if the number of the neighborhood points is smaller than the neighborhood density threshold, and determining the any semantic vector as a core point if the number of the neighborhood points is larger than or equal to the neighborhood density threshold;

If any semantic vector is a core point, determining the point with reachable density of any semantic vector and the density of any semantic vector as fragment clusters, and if any semantic vector is a boundary point, adding any semantic vector into the fragment clusters to which the core point with reachable density of any semantic vector belongs until the core point in at least one semantic vector is clustered, so as to obtain at least one fragment cluster;

and giving an edge to the neighborhood point in each segment cluster in the at least one segment cluster to obtain the cluster map.

6. The method according to any one of claims 1-4, wherein said matching the query vector with at least one segment cluster in a pre-constructed cluster map to determine a target segment cluster to which the query statement belongs from the at least one segment cluster comprises:

calculating, for each of the at least one segment cluster, an average value of core points in the each segment cluster;

taking the average value of the core points in each segment cluster as the clustering center of each segment cluster;

And determining the fragment cluster represented by the cluster center with the smallest target distance in the at least one fragment cluster as the target fragment cluster.

7. The method of claim 1, wherein the encoding the input query term using the pre-training model to obtain a query vector for the query term comprises:

preprocessing the query sentence to obtain a word vector of the query sentence;

obtaining a query matrix, a key matrix and a value matrix based on the word vector calculation;

multiplying the attention weight by the value matrix to obtain an attention vector, and encoding the attention vector to obtain the query vector.

8. An open domain question-answer prediction apparatus based on a pre-training model, the apparatus comprising:

the coding unit is used for coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;

the matching unit is used for matching the query vector with at least one fragment cluster in a pre-constructed cluster map so as to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster;

The updating unit is used for selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster; the first fragment refers to fragments except at least one selected fragment in the target fragment cluster;

the updating unit is specifically configured to: calculating posterior probability of each segment in the query vector and the target segment cluster; selecting at least one segment with a posterior probability greater than or equal to a preset value from the target segment cluster; combining the selected at least one segment with the query statement to obtain an updated query statement;

the updating unit is further configured to repeatedly perform an operation of selecting at least one segment from the target segment cluster according to the posterior probability obtained last time, and obtaining a current updated query statement according to the at least one segment until no segment directly connected to the at least one currently selected segment exists in the target segment cluster;

the updating unit is specifically configured to: selecting at least one segment with the posterior probability greater than or equal to a preset value from the first segment according to the posterior probability obtained by the last calculation; combining the selected at least one fragment with the last updated query sentence to obtain a current updated query sentence;

The prediction unit is used for calculating the target posterior probability of the second segment in the current updated query statement and the target segment cluster, and returning an open domain question-answer result of the query statement according to the target posterior probability; the second segment refers to the segments of the target segment cluster except for the segments currently combined with the query statement updated last time.

9. An electronic device comprising an input device and an output device, further comprising:

a processor adapted to implement one or more instructions; the method comprises the steps of,

a computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the method of any one of claims 1-7.

10. A computer storage medium storing one or more instructions adapted to be loaded by a processor and to perform the method of any one of claims 1-7.