CN113723115A

CN113723115A - Open domain question-answer prediction method based on pre-training model and related equipment

Info

Publication number: CN113723115A
Application number: CN202111167748.7A
Authority: CN
Inventors: 成杰峰; 彭奕
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2021-11-30
Anticipated expiration: 2041-09-30
Also published as: CN113723115B

Abstract

The application relates to the technical field of artificial intelligence, and particularly provides an open domain question-answer prediction method based on a pre-training model and related equipment, wherein the method comprises the following steps: coding the query statement to obtain a query vector; matching the query vector with at least one segment cluster to determine a target segment cluster to which the query statement belongs; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the fragments in the target fragment cluster; repeatedly executing the operation of selecting at least one segment according to the posterior probability and obtaining an updated query statement according to the at least one segment until no segment directly connected with the currently selected at least one segment exists in the target segment cluster; and calculating the posterior probability of the latest query statement and the fragments in the target fragment cluster, and returning a question and answer result according to the posterior probability. The method and the device are beneficial to improving the prediction efficiency in the question answering of the open domain.

Description

Open domain question-answer prediction method based on pre-training model and related equipment

Technical Field

The application relates to the technical field of intelligent question answering, in particular to an open domain question answering prediction method based on a pre-training model and related equipment.

Background

With the development of the internet, the traffic of various industries is increased sharply, the scale of customers is gradually shifted from off-line to on-line, and the number of artificial customer services and the processing efficiency of each enterprise are far from keeping up with the increase of the customers on-line, so various intelligent question-answering systems are urgently needed to alleviate the phenomenon. Most of the existing intelligent question systems are based on closed domains, namely knowledge bases of questions and answers are limited in a certain specific neighborhood, such as banks, insurance, inquiry and the like, and researchers propose open-domain question and answer technology (open-domain QA) driven by customer requirements, which is not limited to questions and answers of a certain neighborhood but learns knowledge based on massive text documents (such as knowledge bases of Wikipedia and the like) of various industries, so that questions of any neighborhood can be answered. In the existing open domain question-answering system, the posterior probability of a query statement and massive fragments needs to be calculated one by one, and the fragments with high probability are extracted.

Disclosure of Invention

In order to solve the problems, the application provides an open domain question-answer prediction method based on a pre-training model and related equipment, and the prediction efficiency in the open domain question-answer is favorably improved.

In order to achieve the above object, a first aspect of the embodiments of the present application provides an open-domain question-answer prediction method based on a pre-training model, where the method includes:

coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;

matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;

selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster;

repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster;

and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.

With reference to the first aspect, in a possible implementation manner, at least one fragment cluster is obtained by clustering fragment data of each field, and before encoding an input query statement by using a pre-training model to obtain a query vector of the query statement, the method further includes:

determining a radius and a neighborhood density threshold value adopted for clustering fragment data of each field in a clustering algorithm;

and constructing a clustering graph based on the radius and the neighborhood density threshold.

With reference to the first aspect, in a possible implementation manner, the determining a radius used for clustering fragment data of each field in a clustering algorithm includes:

coding fragment data of each field by adopting a pre-training model to obtain at least one semantic vector;

carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points;

calculating the average distance between the points of the first target number of points;

repeatedly performing operations of performing logarithmic sampling and calculating the average distance between the points on at least one semantic vector for K times to obtain the average distance between the K points, wherein K is an integer greater than 1;

the average of the average distances between K points was taken as the radius.

With reference to the first aspect, in a possible implementation manner, determining a neighborhood density threshold used for clustering fragment data of each field in a clustering algorithm includes:

carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points;

randomly selecting one point from the second target quantity points as a clustering center, and calculating the quantity of the same type points of the clustering center according to the radius and a predefined discriminant function;

repeatedly performing K times of logarithmic sampling on at least one semantic vector, randomly selecting a point as a clustering center and calculating the number of the similar points of the clustering center to obtain K number values;

and taking the average value of the K number of magnitude values as a neighborhood density threshold value.

With reference to the first aspect, in one possible implementation manner, constructing the clustering graph based on a radius and a neighborhood density threshold includes:

starting from any semantic vector in at least one semantic vector, acquiring the number of neighborhood points of any semantic vector according to the radius, if the number of the neighborhood points is less than a neighborhood density threshold, determining any semantic vector as a boundary point, and if the number of the neighborhood points is greater than or equal to the neighborhood density threshold, determining any semantic vector as a core point;

if any semantic vector is a core point, determining a point with the reachable density of any semantic vector and the density of any semantic vector as a fragment cluster, if any semantic vector is a boundary point, adding any semantic vector into the fragment cluster to which the core point with the reachable density of any semantic vector belongs until the core point in at least one semantic vector is clustered, and obtaining at least one fragment cluster;

and endowing an edge for the neighborhood point in each fragment cluster in at least one fragment cluster to obtain a clustering map.

With reference to the first aspect, in a possible implementation manner, matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster includes:

calculating, for each of the at least one segment cluster, an average of the core points in each segment cluster;

taking the average value of the core points in each fragment cluster as the clustering center of each fragment cluster;

calculating the target distance between the query vector and the clustering center of each fragment cluster;

and determining the segment cluster represented by the cluster center with the minimum target distance in the at least one segment cluster as the target segment cluster.

With reference to the first aspect, in a possible implementation manner, encoding an input query statement by using a pre-training model to obtain a query vector of the query statement includes:

preprocessing a query statement to obtain a word vector of the query statement;

calculating based on the word vectors to obtain a query matrix, a key matrix and a value matrix;

calculating to obtain attention weight based on the query matrix, the key matrix and the value matrix;

and multiplying the attention weight by the value matrix to obtain an attention vector, and encoding the attention vector to obtain a query vector.

A second aspect of the embodiments of the present application provides an open domain question-answer prediction apparatus based on a pre-training model, where the apparatus includes:

the coding unit is used for coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;

the matching unit is used for matching the query vector with at least one segment cluster in a pre-constructed clustering map so as to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;

the updating unit is used for selecting at least one segment from the target segment cluster, obtaining an updated query statement according to the at least one segment, and calculating the posterior probability of the updated query statement and the first segment in the target segment cluster;

the updating unit is further used for repeatedly executing the operation of selecting at least one fragment from the target fragment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one fragment until the target fragment cluster does not have the fragment directly connected with the at least one currently selected fragment;

and the prediction unit is used for calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.

A third aspect of embodiments of the present application provides an electronic device, which includes an input device, an output device, and a processor, and is adapted to implement one or more instructions; and a computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the steps of:

A fourth aspect of embodiments of the present application provides a computer storage medium having one or more instructions stored thereon, the one or more instructions adapted to be loaded by a processor and to perform the following steps:

The above scheme of the present application includes at least the following beneficial effects:

in the embodiment of the application, the input query statement is coded by adopting a pre-training model to obtain a query vector of the query statement; matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster; repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster; and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability. The method includes the steps of firstly clustering fragment data to obtain a clustering graph, when query sentences are input, selecting a target fragment cluster from at least one fragment cluster, using the target fragment cluster as a database, screening at least one fragment from a layer in the target fragment cluster, and returning a fragment with the maximum target posterior probability and a fragment related to the fragment with the maximum target posterior probability in each layer as an open domain question-answer result.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic diagram of an application environment provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of an open-domain question-answer prediction method based on a pre-training model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a cluster map generation provided in an embodiment of the present application;

fig. 4 is a schematic flowchart of another open-domain question-answer prediction method based on a pre-training model according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an open-domain question-answer prediction apparatus based on a pre-training model according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "comprising" and "having," and any variations thereof, as appearing in the specification, claims and drawings of this application, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. Furthermore, the terms "first," "second," and "third," etc. are used to distinguish between different objects and are not used to describe a particular order.

An embodiment of the present application provides an open domain question-answer prediction method based on a pre-training model, which may be implemented based on an application environment shown in fig. 1, please refer to fig. 1, where the application environment includes an electronic device and a user device connected to the electronic device through a network. Wherein the user equipment is provided with an input interface for receiving a query sentence input by a user, such as a consultation question sentence of the user for commodity details, and a communication interface for transmitting the query sentence to the electronic equipment. The electronic equipment receives the query statement through a communication interface of the electronic equipment, and transmits the query statement to the processor, so that the processor executes the open domain question-answer prediction method based on the pre-training model. Because the electronic equipment reduces the query range to the target segment cluster, and does not need to query in each segment cluster, the query calculation amount is greatly reduced, and the prediction efficiency in open domain question answering is favorably improved.

For example, the electronic device may be an independent server, or may be a cloud server that provides basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, web service, cloud communication, middleware service, domain name service, security service, Content Delivery Network (CDN), big data and artificial intelligence platform, and the like. Any of the at least one terminal may be a smartphone, a computer, a wearable device, and a vehicle-mounted device, among others.

Based on the application environment shown in fig. 1, the following describes in detail an open domain question-answering prediction method based on a pre-training model provided in an embodiment of the present application with reference to other drawings.

Referring to fig. 2, fig. 2 is a schematic flow chart of an open domain question-answer prediction method based on a pre-training model according to an embodiment of the present application, where the method is applied to an electronic device, as shown in fig. 2, and includes steps 201 and 205:

201: and coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement.

In an embodiment of the present application, the pre-training model may adopt a BERT (Bidirectional Encoder representation based on transforms) model, where the BERT model is trained and fine-tuned in advance by using data of each neighborhood, so that the model can learn deep information of the data of each neighborhood. Illustratively, encoding the input query statement by using a pre-training model to obtain a query vector of the query statement includes:

preprocessing a query statement to obtain a word vector of the query statement;

It should be understood thatThe BERT model is encoded by a Transformer encoder, a bottom encoder preprocesses an input Query statement (Query) to obtain a corresponding word vector, for example, the preprocessing can be word embedding or one-hot encoding, the self-attention layer of the Transformer encoder constructs a corresponding Query vector q, a key vector k and a value vector v based on the word vector, and the Query vector q, the key vector k and the value vector v are respectively combined with a pre-trained Query weight matrix W^qWeight matrix W^kSum weight matrix W^vMultiplying to obtain a query matrix Q, a key matrix K and a value matrix V, then calculating an attention weight alpha based on the query matrix, the key matrix and the value matrix, finally multiplying the attention weight alpha and the value matrix V to obtain an output attention vector from an attention layer, and coding the attention vector through a feedforward neural network to obtain a query vector.

202: and matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster.

In a specific embodiment of the present application, at least one fragment cluster is obtained by clustering fragment data in each field, and before encoding an input query statement using a pre-training model to obtain a query vector of the query statement, the method further includes:

For example, determining the radius used for clustering the fragment data of each field in the clustering algorithm includes:

the average of the average distances between K points was taken as the radius.

Specifically, assuming that the number of at least one semantic vector is N, the number of targets obtained by each logarithmic sampling is [ ln N]I.e. the first target number is [ ln N ]]To the [ ln N ]]Any two of the points P_iAnd P is_jUsing Euclidean distance as its distance metric, the [ ln N ] can be calculated]The average distance between points is given by the following formula:

wherein eps represents [ ln N ]]Mean distance between points, dist (P)_i,P_j) Representing point P_iAnd a point P_jThe euclidean distance of (c).

For at least one semantic vector, in order to avoid unbalanced sampling, repeatedly performing operations of performing logarithmic sampling and calculating average distance between points on at least one semantic vector for K times to obtain the average distance between K points, and calculating the average value of the average distance between the K points as the radius Eps of a neighborhood, wherein the formula is as follows:

wherein eps₁，eps₂，…，eps_kRespectively, the 1 st point average distance, the 2 nd point average distance, …, the Kth point average distance, and Eps represents the final neighborhood radius. In this embodiment, so select samples [ ln N]The point is that the number of samples in the open field scene is large, usually tens of millions or more, and if all distance calculations are performed, the calculation amount is very large (N)²Magnitude), therefore, taking logarithmic sampling can significantly reduce the amount of computation, and at the same time, performing logarithmic sampling K times can solve the problem of sampling imbalance.

For example, determining a neighborhood density threshold for clustering fragment data of each domain in a clustering algorithm includes:

Specifically, in order to reduce the amount of calculation, when a neighborhood density threshold is determined, logarithmic sampling is also performed on at least one semantic vector to obtain [ ln N ] points, that is, the number of second targets is [ ln N ], then a point X is randomly selected from the [ ln N ] points as a clustering center, and then the number of points belonging to the same category as the clustering center is calculated based on a previously determined radius parameter Eps and a predefined discriminant function, where the discriminant function is defined as:

where u represents the ratio of the two-point spacing to Eps, and the discriminant function D (u) represents: for a point, if the distance between the point nearby and the point nearby is less than Eps, the point is the same kind of point. The calculation formula of the number of the homologous points is as follows:

wherein, Count represents the number of homologous points of the clustering center X in a single calculation, dist (P)_iX) represents a point P_iEuclidean distance from the cluster center X. Similar to the radius parameter Eps, to avoid samplingRepeating the operations of logarithmic sampling for K times, selecting a clustering center and calculating the number of similar points of the clustering center to obtain K number values, and calculating the average value of the K number values as a neighborhood density threshold Minpts, wherein the formula is as follows:

wherein, Count₁，Count₂，…，Count_kRespectively, 1 st, 2 nd, … th and Kth numeric values. In the embodiment, similar to the radius parameter Eps, the neighborhood density threshold Minpts is determined in a self-adaptive manner, in clustering, the radius Eps and the neighborhood density threshold Minpts need to be determined in advance, and the two values often bring distinct clustering results according to different selections, so that the accuracy of a final returned result is influenced.

Illustratively, constructing the clustering graph based on the radius and the neighborhood density threshold comprises:

Specifically, each fragment data corresponds to one semantic vector, the semantic vectors are all represented as one point in a high-dimensional space, any one semantic vector is represented as a point p, the number of neighborhood points of the point p is determined according to a preset radius Eps, if the number of the neighborhood points of the point p is smaller than a neighborhood density threshold value Minpts, the point p is a boundary point, if the number of the neighborhood points of the point p is larger than or equal to the neighborhood density threshold value Minpts, the point p is a core point, as shown in fig. 3, if the neighborhood density threshold value Minpts is 3, 3 points exist in a neighborhood of the p point, the p point is the core point, and if only two points exist in a neighborhood of the q point, the q point is the boundary point. If the point p is a core point, a segment cluster can be determined, and all points with the reachable density of the point p belong to the segment cluster, if the point p is a boundary point, the point p can be divided into the segment clusters to which the core point with the reachable density belongs, all the segment clusters to which the core point belongs are determined according to the method, at least one segment cluster is obtained, for each segment cluster in the at least one segment cluster, an edge is given to a neighborhood point in each segment cluster, for example, in the neighborhood of the point p in the point q in fig. 3, an edge is given to the point p and the point q, in the neighborhood of the point s, an edge is given to the point p and the point s, a graph corresponding to each segment cluster is obtained, the clustering graphs form the clustering graph, and the clustering graph is stored for subsequent matching.

Illustratively, matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster, includes:

The minimum target distance between the query vector and the cluster center indicates that the cluster center is closest to the query vector, and the query statement belongs to the category represented by the cluster center.

203: and selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster.

In the embodiment of the application, a target fragment cluster is used as a database for Query, a posterior probability of each fragment in a Query vector and the target fragment cluster is calculated, the fragments in the target fragment cluster are sorted according to the posterior probability, at least one fragment with the posterior probability being greater than or equal to a preset value is selected, for example, at least one fragment is P1, P2 and P3 respectively, P1, P2 and P3 are combined with a Query statement respectively to obtain an updated Query statement, for example, Query P1 is formed by P1, the updated Query statement Query P1 is used as a new input, and the posterior probability of the updated Query statement Query P1 and a first fragment in the target fragment cluster is calculated, wherein the first fragment is a fragment in the target fragment cluster except P1.

204: and repeating the operation of selecting at least one fragment from the target fragment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster.

In the embodiment of the application, according to the posterior probability obtained by the last calculation, the fragments in the target fragment cluster except for P1 are sorted, at least one fragment with the posterior probability being greater than or equal to the preset value is selected, for example, the at least one fragment is P11, P12, P13, P1, P2, P3 and the last input Query P1 form an updated Query statement, for example, the at least one fragment and P12 form a currently updated Query statement P1P 12, and the above operations are repeated until at least one currently selected fragment in each path of P1, P2, and P3 does not have a directly connected fragment in the target fragment cluster, that is, a fragment having a correlation with at least one currently selected fragment does not exist in the target fragment cluster through analysis of a clustering map.

205: and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.

In the embodiment of the application, assuming that after a current updated Query statement Query P1P 12 is formed, a segment directly connected to P12 does not exist in a target segment cluster, the update input is stopped, a target posterior probability of the current updated Query statement Query P1P 12 and a second segment in the target segment cluster is calculated, where the second segment is a segment other than P12 in the target segment cluster, the second segment is ranked according to the target posterior probability, a segment with the maximum target posterior probability, such as P115, is selected, P115, P12, and P1 are used as open-domain question-answer results of the Query statement, and then the open-domain question-answer result is returned to a user. Of course, the above is taken as an example, in an actual scenario, there are also updated query statements composed of P2 and P3, and the maximum target posterior probability refers to the maximum posterior probability in all current updated query statements. According to the requirement of the correlation between the segments, the number of at least one segment selected each time can be the same or different, for example, the input is updated later, the value of the calculated posterior probability may not be high overall, and therefore, the number of at least one segment can be in a descending trend to reduce the calculation amount.

It can be seen that, in the embodiment of the present application, an input query statement is encoded by using a pre-training model, so as to obtain a query vector of the query statement; matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster; repeatedly executing the operation of selecting at least one fragment from the target fragment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the currently selected at least one fragment exists in the target fragment cluster; and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability. The method includes the steps of firstly clustering fragment data to obtain a clustering graph, when query sentences are input, selecting a target fragment cluster from at least one fragment cluster, using the target fragment cluster as a database, screening at least one fragment from a layer in the target fragment cluster, and returning a fragment with the maximum target posterior probability and a fragment related to the fragment with the maximum target posterior probability in each layer as an open domain question-answer result.

Referring to fig. 4, fig. 4 is a schematic flow chart of another open-domain question-answer prediction method based on a pre-training model according to an embodiment of the present application, as shown in fig. 4, including steps 401 and 410:

401: coding fragment data of each field by adopting a pre-training model to obtain at least one semantic vector;

402: carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points, and calculating the average distance between the points of the first target number of points;

403: repeatedly executing operations of carrying out logarithmic sampling and calculating the average distance between the points on at least one semantic vector for K times to obtain the average distance between the K points, and taking the average value of the average distance between the K points as the radius of the cluster;

404: carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points, randomly selecting one point from the second target number of points as a clustering center, and calculating the number of similar points of the clustering center according to the radius and a predefined discriminant function;

405: repeatedly performing operations of performing logarithmic sampling on at least one semantic vector for K times, randomly selecting a point as a clustering center and calculating the number of similar points of the clustering center to obtain K number values, and taking the average value of the K number values as a neighborhood density threshold value of clustering;

406: coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;

407: matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster; the at least one fragment cluster is obtained by clustering based on a radius and a neighborhood density threshold;

408: selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster;

409: repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster;

410: and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.

The specific implementation of

steps

401 and 410 has been described in the embodiment shown in fig. 2, and can achieve the same or similar beneficial effects, and is not repeated here for avoiding repetition.

Please refer to fig. 5 based on the description of the embodiment of the open-domain question-answer prediction method based on the pre-training model, where fig. 5 is a schematic structural diagram of an open-domain question-answer prediction apparatus based on the pre-training model according to the embodiment of the present application, and as shown in fig. 5, the apparatus includes:

the encoding unit 501 is configured to encode the input query statement by using a pre-training model to obtain a query vector of the query statement;

a matching unit 502, configured to match the query vector with at least one segment cluster in a pre-constructed clustering map, so as to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;

an updating unit 503, configured to select at least one segment from the target segment cluster, obtain an updated query statement according to the at least one segment, and calculate a posterior probability between the updated query statement and a first segment in the target segment cluster;

the updating unit 503 is further configured to repeatedly perform operations of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining a currently updated query statement according to the at least one segment until no segment directly connected to the currently selected at least one segment exists in the target segment cluster;

the predicting unit 504 is configured to calculate a target posterior probability of the current updated query statement and the second segment in the target segment cluster, and return an open domain question-answer result of the query statement according to the target posterior probability.

It can be seen that, in the open-domain question-answer prediction apparatus based on the pre-training model shown in fig. 5, the pre-training model is used to encode the input query statement, so as to obtain the query vector of the query statement; matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster; repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster; and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability. The method includes the steps of firstly clustering fragment data to obtain a clustering graph, when query sentences are input, selecting a target fragment cluster from at least one fragment cluster, using the target fragment cluster as a database, screening at least one fragment from a layer in the target fragment cluster, and returning a fragment with the maximum target posterior probability and a fragment related to the fragment with the maximum target posterior probability in each layer as an open domain question-answer result.

In a possible embodiment, at least one fragment cluster is obtained by clustering fragment data of each domain, and the encoding unit 501 is further configured to:

In a possible implementation manner, in determining the radius used for clustering the fragment data of each field in the clustering algorithm, the encoding unit 501 is specifically configured to:

the average of the average distances between K points was taken as the radius.

In one possible implementation, in determining a neighborhood density threshold used for clustering fragment data of each field in a clustering algorithm, the encoding unit 501 is specifically configured to:

In a possible implementation, in constructing the cluster map based on the radius and the neighborhood density threshold, the encoding unit 501 is specifically configured to:

In a possible implementation manner, in matching the query vector with at least one segment cluster in the pre-constructed cluster map to determine a target segment cluster to which the query statement belongs from the at least one segment cluster, the matching unit 502 is specifically configured to:

In a possible implementation manner, in terms of encoding an input query statement by using a pre-training model to obtain a query vector of the query statement, the encoding unit 501 is specifically configured to:

preprocessing a query statement to obtain a word vector of the query statement;

According to an embodiment of the present application, the units of the open-domain question-answering prediction apparatus based on the pre-trained model shown in fig. 5 may be respectively or completely combined into one or several other units to form the open-domain question-answering prediction apparatus, or some unit(s) thereof may be further split into multiple functionally smaller units to form the open-domain question-answering prediction apparatus, which may implement the same operation without affecting implementation of technical effects of embodiments of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the open-domain question-answering prediction apparatus based on the pre-trained model may also include other units, and in practical applications, these functions may also be implemented by assistance of other units, and may be implemented by cooperation of multiple units.

According to another embodiment of the present application, the open-domain question-and-answer prediction apparatus device based on the pre-trained model as shown in fig. 5 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the corresponding method as shown in fig. 2 or fig. 4 on a general-purpose computing device, such as a computer, including a processing element and a storage element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and the like, and the open-domain question-and-answer prediction method based on the pre-trained model according to an embodiment of the present application may be implemented. The computer program may be recorded on a computer-readable recording medium, for example, and loaded and executed in the above-described computing apparatus via the computer-readable recording medium.

Based on the description of the method embodiment and the device embodiment, the embodiment of the application further provides an electronic device. Referring to fig. 6, the electronic device includes at least a processor 601, an input device 602, an output device 603, and a computer storage medium 604. The processor 601, input device 602, output device 603, and computer storage medium 604 within the electronic device may be connected by a bus or other means.

A computer storage medium 604 may be stored in the memory of the electronic device, the computer storage medium 604 being for storing a computer program comprising program instructions, the processor 601 being for executing the program instructions stored by the computer storage medium 604. The processor 601 (or CPU) is a computing core and a control core of the electronic device, and is adapted to implement one or more instructions, and in particular, is adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function.

In one embodiment, the processor 601 of the electronic device provided in the embodiment of the present application may be configured to perform a series of open-domain question-answering prediction processes based on a pre-trained model:

It can be seen that, in the electronic device shown in fig. 6, the query vectors of the query sentences are obtained by encoding the input query sentences with the pre-training model; matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster; repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster; and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability. The method includes the steps of firstly clustering fragment data to obtain a clustering graph, when query sentences are input, selecting a target fragment cluster from at least one fragment cluster, using the target fragment cluster as a database, screening at least one fragment from a layer in the target fragment cluster, and returning a fragment with the maximum target posterior probability and a fragment related to the fragment with the maximum target posterior probability in each layer as an open domain question-answer result.

In another embodiment, at least one fragment cluster is obtained by clustering fragment data of each field, and before encoding an input query statement using a pre-training model to obtain a query vector of the query statement, the processor 601 is further configured to:

In another embodiment, the processor 601 executes a clustering algorithm to determine the radius to cluster the fragment data of each domain, including:

the average of the average distances between K points was taken as the radius.

In another embodiment, the processor 601 executes a neighborhood density threshold for determining a neighborhood density threshold for clustering fragment data of each domain in a clustering algorithm, including:

In yet another embodiment, processor 601 performs the constructing the cluster map based on the radius and the neighborhood density threshold, including:

In another embodiment, the matching of the query vector and at least one segment cluster in the pre-constructed clustering graph by the processor 601 to determine a target segment cluster to which the query statement belongs from the at least one segment cluster includes:

In another embodiment, the processor 601 performs encoding on the input query statement by using the pre-training model to obtain a query vector of the query statement, including:

preprocessing a query statement to obtain a word vector of the query statement;

By way of example, electronic devices include, but are not limited to, a processor 601, an input device 602, an output device 603, and a computer storage medium 604. And the system also comprises a memory, a power supply, an application client module and the like. The input device 602 may be a keyboard, touch screen, radio frequency receiver, etc., and the output device 603 may be a speaker, display, radio frequency transmitter, etc. It will be appreciated by those skilled in the art that the schematic diagrams are merely examples of an electronic device and are not limiting of an electronic device and may include more or fewer components than those shown, or some components in combination, or different components.

It should be noted that, since the processor 601 of the electronic device executes the computer program to implement the steps in the open-domain question-answer prediction method based on the pre-trained model, the embodiments of the open-domain question-answer prediction method based on the pre-trained model are all applicable to the electronic device, and all can achieve the same or similar beneficial effects.

An embodiment of the present application further provides a computer storage medium (Memory), which is a Memory device in an electronic device and is used to store programs and data. It is understood that the computer storage medium herein may include a built-in storage medium in the terminal, and may also include an extended storage medium supported by the terminal. The computer storage medium provides a storage space that stores an operating system of the terminal. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 601. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; alternatively, it may be at least one computer storage medium located remotely from the processor 601. In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by processor 601 to perform the corresponding steps described above with respect to the open-domain question-and-answer prediction method based on a pre-trained model.

Illustratively, the computer program of the computer storage medium includes computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.

It should be noted that, since the computer program of the computer storage medium is executed by the processor to implement the steps in the open-domain question-answer prediction method based on the pre-trained model, all the embodiments of the open-domain question-answer prediction method based on the pre-trained model are applicable to the computer storage medium, and can achieve the same or similar beneficial effects.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An open domain question-answer prediction method based on a pre-training model is characterized by comprising the following steps:

coding an input query statement by adopting a pre-training model to obtain a query vector of the query statement;

selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster;

repeatedly executing the operation of selecting at least one fragment from the target fragment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the currently selected at least one fragment exists in the target fragment cluster;

2. The method according to claim 1, wherein the at least one fragment cluster is obtained by clustering fragment data of each field, and before encoding an input query statement using a pre-training model to obtain a query vector of the query statement, the method further comprises:

and constructing the clustering graph based on the radius and the neighborhood density threshold value.

3. The method of claim 2, wherein determining the radius used by the clustering algorithm to cluster the fragment data of each domain comprises:

coding the fragment data of each field by adopting the pre-training model to obtain at least one semantic vector;

carrying out logarithmic sampling on the at least one semantic vector to obtain a first target number of points;

repeatedly executing operations of carrying out logarithmic sampling and calculating the average distance between the points on the at least one semantic vector for K times to obtain the average distance between the K points, wherein K is an integer larger than 1;

and taking the average value of the average distances among the K points as the radius.

4. The method of claim 3, wherein determining a neighborhood density threshold for clustering fragment data of each domain in a clustering algorithm comprises:

carrying out logarithmic sampling on the at least one semantic vector to obtain a second target number of points;

repeatedly performing K times of logarithmic sampling on the at least one semantic vector, randomly selecting a point as a clustering center and calculating the number of the similar points of the clustering center to obtain K number values;

and taking the average value of the K number of magnitude values as the neighborhood density threshold value.

5. The method of claim 3, wherein constructing the clustering graph based on the radius and the neighborhood density threshold comprises:

starting from any semantic vector in the at least one semantic vector, acquiring the number of neighborhood points of the any semantic vector according to the radius, if the number of the neighborhood points is smaller than the neighborhood density threshold, determining the any semantic vector as a boundary point, and if the number of the neighborhood points is larger than or equal to the neighborhood density threshold, determining the any semantic vector as a core point;

if any semantic vector is a core point, determining a point with the reachable density of any semantic vector and the density of any semantic vector as a fragment cluster, and if any semantic vector is a boundary point, adding any semantic vector into the fragment cluster to which the core point with the reachable density of any semantic vector belongs until the core point in at least one semantic vector is clustered, so as to obtain at least one fragment cluster;

and assigning an edge to the neighborhood point in each of the at least one fragment cluster to obtain the clustering graph.

6. The method according to any one of claims 1 to 4, wherein the matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster comprises:

calculating, for each of the at least one segment cluster, an average of core points in the each segment cluster;

taking the average value of the core points in each segment cluster as the clustering center of each segment cluster;

calculating a target distance between the query vector and the clustering center of each fragment cluster;

7. The method of claim 1, wherein the encoding the input query statement using the pre-training model to obtain the query vector of the query statement comprises:

preprocessing the query statement to obtain a word vector of the query statement;

calculating attention weight based on the query matrix, the key matrix and the value matrix;

and multiplying the attention weight by the value matrix to obtain an attention vector, and encoding the attention vector to obtain the query vector.

8. An open-domain question-answer prediction device based on a pre-training model, the device comprising:

the encoding unit is used for encoding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;

the updating unit is further configured to repeatedly perform an operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining a currently updated query statement according to the at least one segment until no segment directly connected with the currently selected at least one segment exists in the target segment cluster;

9. An electronic device comprising an input device and an output device, further comprising:

a processor adapted to implement one or more instructions; and the number of the first and second groups,

a computer storage medium having one or more instructions stored thereon, the one or more instructions adapted to be loaded by the processor and to perform the method of any of claims 1-7.

10. A computer storage medium having stored thereon one or more instructions adapted to be loaded by a processor and to perform the method of any of claims 1-7.