CN113723115A - Open domain question-answer prediction method based on pre-training model and related equipment - Google Patents
Open domain question-answer prediction method based on pre-training model and related equipment Download PDFInfo
- Publication number
- CN113723115A CN113723115A CN202111167748.7A CN202111167748A CN113723115A CN 113723115 A CN113723115 A CN 113723115A CN 202111167748 A CN202111167748 A CN 202111167748A CN 113723115 A CN113723115 A CN 113723115A
- Authority
- CN
- China
- Prior art keywords
- cluster
- fragment
- target
- segment
- query statement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000012549 training Methods 0.000 title claims abstract description 50
- 239000012634 fragment Substances 0.000 claims abstract description 217
- 239000013598 vector Substances 0.000 claims abstract description 185
- 239000011159 matrix material Substances 0.000 claims description 45
- 238000005070 sampling Methods 0.000 claims description 32
- 238000004422 calculation algorithm Methods 0.000 claims description 15
- 238000007781 pre-processing Methods 0.000 claims description 6
- 230000009286 beneficial effect Effects 0.000 abstract description 5
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 14
- 238000004590 computer program Methods 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000012216 screening Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application relates to the technical field of artificial intelligence, and particularly provides an open domain question-answer prediction method based on a pre-training model and related equipment, wherein the method comprises the following steps: coding the query statement to obtain a query vector; matching the query vector with at least one segment cluster to determine a target segment cluster to which the query statement belongs; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the fragments in the target fragment cluster; repeatedly executing the operation of selecting at least one segment according to the posterior probability and obtaining an updated query statement according to the at least one segment until no segment directly connected with the currently selected at least one segment exists in the target segment cluster; and calculating the posterior probability of the latest query statement and the fragments in the target fragment cluster, and returning a question and answer result according to the posterior probability. The method and the device are beneficial to improving the prediction efficiency in the question answering of the open domain.
Description
Technical Field
The application relates to the technical field of intelligent question answering, in particular to an open domain question answering prediction method based on a pre-training model and related equipment.
Background
With the development of the internet, the traffic of various industries is increased sharply, the scale of customers is gradually shifted from off-line to on-line, and the number of artificial customer services and the processing efficiency of each enterprise are far from keeping up with the increase of the customers on-line, so various intelligent question-answering systems are urgently needed to alleviate the phenomenon. Most of the existing intelligent question systems are based on closed domains, namely knowledge bases of questions and answers are limited in a certain specific neighborhood, such as banks, insurance, inquiry and the like, and researchers propose open-domain question and answer technology (open-domain QA) driven by customer requirements, which is not limited to questions and answers of a certain neighborhood but learns knowledge based on massive text documents (such as knowledge bases of Wikipedia and the like) of various industries, so that questions of any neighborhood can be answered. In the existing open domain question-answering system, the posterior probability of a query statement and massive fragments needs to be calculated one by one, and the fragments with high probability are extracted.
Disclosure of Invention
In order to solve the problems, the application provides an open domain question-answer prediction method based on a pre-training model and related equipment, and the prediction efficiency in the open domain question-answer is favorably improved.
In order to achieve the above object, a first aspect of the embodiments of the present application provides an open-domain question-answer prediction method based on a pre-training model, where the method includes:
coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;
matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;
selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster;
repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster;
and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
With reference to the first aspect, in a possible implementation manner, at least one fragment cluster is obtained by clustering fragment data of each field, and before encoding an input query statement by using a pre-training model to obtain a query vector of the query statement, the method further includes:
determining a radius and a neighborhood density threshold value adopted for clustering fragment data of each field in a clustering algorithm;
and constructing a clustering graph based on the radius and the neighborhood density threshold.
With reference to the first aspect, in a possible implementation manner, the determining a radius used for clustering fragment data of each field in a clustering algorithm includes:
coding fragment data of each field by adopting a pre-training model to obtain at least one semantic vector;
carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points;
calculating the average distance between the points of the first target number of points;
repeatedly performing operations of performing logarithmic sampling and calculating the average distance between the points on at least one semantic vector for K times to obtain the average distance between the K points, wherein K is an integer greater than 1;
the average of the average distances between K points was taken as the radius.
With reference to the first aspect, in a possible implementation manner, determining a neighborhood density threshold used for clustering fragment data of each field in a clustering algorithm includes:
carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points;
randomly selecting one point from the second target quantity points as a clustering center, and calculating the quantity of the same type points of the clustering center according to the radius and a predefined discriminant function;
repeatedly performing K times of logarithmic sampling on at least one semantic vector, randomly selecting a point as a clustering center and calculating the number of the similar points of the clustering center to obtain K number values;
and taking the average value of the K number of magnitude values as a neighborhood density threshold value.
With reference to the first aspect, in one possible implementation manner, constructing the clustering graph based on a radius and a neighborhood density threshold includes:
starting from any semantic vector in at least one semantic vector, acquiring the number of neighborhood points of any semantic vector according to the radius, if the number of the neighborhood points is less than a neighborhood density threshold, determining any semantic vector as a boundary point, and if the number of the neighborhood points is greater than or equal to the neighborhood density threshold, determining any semantic vector as a core point;
if any semantic vector is a core point, determining a point with the reachable density of any semantic vector and the density of any semantic vector as a fragment cluster, if any semantic vector is a boundary point, adding any semantic vector into the fragment cluster to which the core point with the reachable density of any semantic vector belongs until the core point in at least one semantic vector is clustered, and obtaining at least one fragment cluster;
and endowing an edge for the neighborhood point in each fragment cluster in at least one fragment cluster to obtain a clustering map.
With reference to the first aspect, in a possible implementation manner, matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster includes:
calculating, for each of the at least one segment cluster, an average of the core points in each segment cluster;
taking the average value of the core points in each fragment cluster as the clustering center of each fragment cluster;
calculating the target distance between the query vector and the clustering center of each fragment cluster;
and determining the segment cluster represented by the cluster center with the minimum target distance in the at least one segment cluster as the target segment cluster.
With reference to the first aspect, in a possible implementation manner, encoding an input query statement by using a pre-training model to obtain a query vector of the query statement includes:
preprocessing a query statement to obtain a word vector of the query statement;
calculating based on the word vectors to obtain a query matrix, a key matrix and a value matrix;
calculating to obtain attention weight based on the query matrix, the key matrix and the value matrix;
and multiplying the attention weight by the value matrix to obtain an attention vector, and encoding the attention vector to obtain a query vector.
A second aspect of the embodiments of the present application provides an open domain question-answer prediction apparatus based on a pre-training model, where the apparatus includes:
the coding unit is used for coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;
the matching unit is used for matching the query vector with at least one segment cluster in a pre-constructed clustering map so as to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;
the updating unit is used for selecting at least one segment from the target segment cluster, obtaining an updated query statement according to the at least one segment, and calculating the posterior probability of the updated query statement and the first segment in the target segment cluster;
the updating unit is further used for repeatedly executing the operation of selecting at least one fragment from the target fragment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one fragment until the target fragment cluster does not have the fragment directly connected with the at least one currently selected fragment;
and the prediction unit is used for calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
A third aspect of embodiments of the present application provides an electronic device, which includes an input device, an output device, and a processor, and is adapted to implement one or more instructions; and a computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the steps of:
coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;
matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;
selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster;
repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster;
and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
A fourth aspect of embodiments of the present application provides a computer storage medium having one or more instructions stored thereon, the one or more instructions adapted to be loaded by a processor and to perform the following steps:
coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;
matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;
selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster;
repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster;
and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
The above scheme of the present application includes at least the following beneficial effects:
in the embodiment of the application, the input query statement is coded by adopting a pre-training model to obtain a query vector of the query statement; matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster; repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster; and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability. The method includes the steps of firstly clustering fragment data to obtain a clustering graph, when query sentences are input, selecting a target fragment cluster from at least one fragment cluster, using the target fragment cluster as a database, screening at least one fragment from a layer in the target fragment cluster, and returning a fragment with the maximum target posterior probability and a fragment related to the fragment with the maximum target posterior probability in each layer as an open domain question-answer result.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic diagram of an application environment provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of an open-domain question-answer prediction method based on a pre-training model according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a cluster map generation provided in an embodiment of the present application;
fig. 4 is a schematic flowchart of another open-domain question-answer prediction method based on a pre-training model according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an open-domain question-answer prediction apparatus based on a pre-training model according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "comprising" and "having," and any variations thereof, as appearing in the specification, claims and drawings of this application, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. Furthermore, the terms "first," "second," and "third," etc. are used to distinguish between different objects and are not used to describe a particular order.
An embodiment of the present application provides an open domain question-answer prediction method based on a pre-training model, which may be implemented based on an application environment shown in fig. 1, please refer to fig. 1, where the application environment includes an electronic device and a user device connected to the electronic device through a network. Wherein the user equipment is provided with an input interface for receiving a query sentence input by a user, such as a consultation question sentence of the user for commodity details, and a communication interface for transmitting the query sentence to the electronic equipment. The electronic equipment receives the query statement through a communication interface of the electronic equipment, and transmits the query statement to the processor, so that the processor executes the open domain question-answer prediction method based on the pre-training model. Because the electronic equipment reduces the query range to the target segment cluster, and does not need to query in each segment cluster, the query calculation amount is greatly reduced, and the prediction efficiency in open domain question answering is favorably improved.
For example, the electronic device may be an independent server, or may be a cloud server that provides basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, web service, cloud communication, middleware service, domain name service, security service, Content Delivery Network (CDN), big data and artificial intelligence platform, and the like. Any of the at least one terminal may be a smartphone, a computer, a wearable device, and a vehicle-mounted device, among others.
Based on the application environment shown in fig. 1, the following describes in detail an open domain question-answering prediction method based on a pre-training model provided in an embodiment of the present application with reference to other drawings.
Referring to fig. 2, fig. 2 is a schematic flow chart of an open domain question-answer prediction method based on a pre-training model according to an embodiment of the present application, where the method is applied to an electronic device, as shown in fig. 2, and includes steps 201 and 205:
201: and coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement.
In an embodiment of the present application, the pre-training model may adopt a BERT (Bidirectional Encoder representation based on transforms) model, where the BERT model is trained and fine-tuned in advance by using data of each neighborhood, so that the model can learn deep information of the data of each neighborhood. Illustratively, encoding the input query statement by using a pre-training model to obtain a query vector of the query statement includes:
preprocessing a query statement to obtain a word vector of the query statement;
calculating based on the word vectors to obtain a query matrix, a key matrix and a value matrix;
calculating to obtain attention weight based on the query matrix, the key matrix and the value matrix;
and multiplying the attention weight by the value matrix to obtain an attention vector, and encoding the attention vector to obtain a query vector.
It should be understood thatThe BERT model is encoded by a Transformer encoder, a bottom encoder preprocesses an input Query statement (Query) to obtain a corresponding word vector, for example, the preprocessing can be word embedding or one-hot encoding, the self-attention layer of the Transformer encoder constructs a corresponding Query vector q, a key vector k and a value vector v based on the word vector, and the Query vector q, the key vector k and the value vector v are respectively combined with a pre-trained Query weight matrix WqWeight matrix WkSum weight matrix WvMultiplying to obtain a query matrix Q, a key matrix K and a value matrix V, then calculating an attention weight alpha based on the query matrix, the key matrix and the value matrix, finally multiplying the attention weight alpha and the value matrix V to obtain an output attention vector from an attention layer, and coding the attention vector through a feedforward neural network to obtain a query vector.
202: and matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster.
In a specific embodiment of the present application, at least one fragment cluster is obtained by clustering fragment data in each field, and before encoding an input query statement using a pre-training model to obtain a query vector of the query statement, the method further includes:
determining a radius and a neighborhood density threshold value adopted for clustering fragment data of each field in a clustering algorithm;
and constructing a clustering graph based on the radius and the neighborhood density threshold.
For example, determining the radius used for clustering the fragment data of each field in the clustering algorithm includes:
coding fragment data of each field by adopting a pre-training model to obtain at least one semantic vector;
carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points;
calculating the average distance between the points of the first target number of points;
repeatedly performing operations of performing logarithmic sampling and calculating the average distance between the points on at least one semantic vector for K times to obtain the average distance between the K points, wherein K is an integer greater than 1;
the average of the average distances between K points was taken as the radius.
Specifically, assuming that the number of at least one semantic vector is N, the number of targets obtained by each logarithmic sampling is [ ln N]I.e. the first target number is [ ln N ]]To the [ ln N ]]Any two of the points PiAnd P isjUsing Euclidean distance as its distance metric, the [ ln N ] can be calculated]The average distance between points is given by the following formula:
wherein eps represents [ ln N ]]Mean distance between points, dist (P)i,Pj) Representing point PiAnd a point PjThe euclidean distance of (c).
For at least one semantic vector, in order to avoid unbalanced sampling, repeatedly performing operations of performing logarithmic sampling and calculating average distance between points on at least one semantic vector for K times to obtain the average distance between K points, and calculating the average value of the average distance between the K points as the radius Eps of a neighborhood, wherein the formula is as follows:
wherein eps1,eps2,…,epskRespectively, the 1 st point average distance, the 2 nd point average distance, …, the Kth point average distance, and Eps represents the final neighborhood radius. In this embodiment, so select samples [ ln N]The point is that the number of samples in the open field scene is large, usually tens of millions or more, and if all distance calculations are performed, the calculation amount is very large (N)2Magnitude), therefore, taking logarithmic sampling can significantly reduce the amount of computation, and at the same time, performing logarithmic sampling K times can solve the problem of sampling imbalance.
For example, determining a neighborhood density threshold for clustering fragment data of each domain in a clustering algorithm includes:
carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points;
randomly selecting one point from the second target quantity points as a clustering center, and calculating the quantity of the same type points of the clustering center according to the radius and a predefined discriminant function;
repeatedly performing K times of logarithmic sampling on at least one semantic vector, randomly selecting a point as a clustering center and calculating the number of the similar points of the clustering center to obtain K number values;
and taking the average value of the K number of magnitude values as a neighborhood density threshold value.
Specifically, in order to reduce the amount of calculation, when a neighborhood density threshold is determined, logarithmic sampling is also performed on at least one semantic vector to obtain [ ln N ] points, that is, the number of second targets is [ ln N ], then a point X is randomly selected from the [ ln N ] points as a clustering center, and then the number of points belonging to the same category as the clustering center is calculated based on a previously determined radius parameter Eps and a predefined discriminant function, where the discriminant function is defined as:
where u represents the ratio of the two-point spacing to Eps, and the discriminant function D (u) represents: for a point, if the distance between the point nearby and the point nearby is less than Eps, the point is the same kind of point. The calculation formula of the number of the homologous points is as follows:
wherein, Count represents the number of homologous points of the clustering center X in a single calculation, dist (P)iX) represents a point PiEuclidean distance from the cluster center X. Similar to the radius parameter Eps, to avoid samplingRepeating the operations of logarithmic sampling for K times, selecting a clustering center and calculating the number of similar points of the clustering center to obtain K number values, and calculating the average value of the K number values as a neighborhood density threshold Minpts, wherein the formula is as follows:
wherein, Count1,Count2,…,CountkRespectively, 1 st, 2 nd, … th and Kth numeric values. In the embodiment, similar to the radius parameter Eps, the neighborhood density threshold Minpts is determined in a self-adaptive manner, in clustering, the radius Eps and the neighborhood density threshold Minpts need to be determined in advance, and the two values often bring distinct clustering results according to different selections, so that the accuracy of a final returned result is influenced.
Illustratively, constructing the clustering graph based on the radius and the neighborhood density threshold comprises:
starting from any semantic vector in at least one semantic vector, acquiring the number of neighborhood points of any semantic vector according to the radius, if the number of the neighborhood points is less than a neighborhood density threshold, determining any semantic vector as a boundary point, and if the number of the neighborhood points is greater than or equal to the neighborhood density threshold, determining any semantic vector as a core point;
if any semantic vector is a core point, determining a point with the reachable density of any semantic vector and the density of any semantic vector as a fragment cluster, if any semantic vector is a boundary point, adding any semantic vector into the fragment cluster to which the core point with the reachable density of any semantic vector belongs until the core point in at least one semantic vector is clustered, and obtaining at least one fragment cluster;
and endowing an edge for the neighborhood point in each fragment cluster in at least one fragment cluster to obtain a clustering map.
Specifically, each fragment data corresponds to one semantic vector, the semantic vectors are all represented as one point in a high-dimensional space, any one semantic vector is represented as a point p, the number of neighborhood points of the point p is determined according to a preset radius Eps, if the number of the neighborhood points of the point p is smaller than a neighborhood density threshold value Minpts, the point p is a boundary point, if the number of the neighborhood points of the point p is larger than or equal to the neighborhood density threshold value Minpts, the point p is a core point, as shown in fig. 3, if the neighborhood density threshold value Minpts is 3, 3 points exist in a neighborhood of the p point, the p point is the core point, and if only two points exist in a neighborhood of the q point, the q point is the boundary point. If the point p is a core point, a segment cluster can be determined, and all points with the reachable density of the point p belong to the segment cluster, if the point p is a boundary point, the point p can be divided into the segment clusters to which the core point with the reachable density belongs, all the segment clusters to which the core point belongs are determined according to the method, at least one segment cluster is obtained, for each segment cluster in the at least one segment cluster, an edge is given to a neighborhood point in each segment cluster, for example, in the neighborhood of the point p in the point q in fig. 3, an edge is given to the point p and the point q, in the neighborhood of the point s, an edge is given to the point p and the point s, a graph corresponding to each segment cluster is obtained, the clustering graphs form the clustering graph, and the clustering graph is stored for subsequent matching.
Illustratively, matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster, includes:
calculating, for each of the at least one segment cluster, an average of the core points in each segment cluster;
taking the average value of the core points in each fragment cluster as the clustering center of each fragment cluster;
calculating the target distance between the query vector and the clustering center of each fragment cluster;
and determining the segment cluster represented by the cluster center with the minimum target distance in the at least one segment cluster as the target segment cluster.
The minimum target distance between the query vector and the cluster center indicates that the cluster center is closest to the query vector, and the query statement belongs to the category represented by the cluster center.
203: and selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster.
In the embodiment of the application, a target fragment cluster is used as a database for Query, a posterior probability of each fragment in a Query vector and the target fragment cluster is calculated, the fragments in the target fragment cluster are sorted according to the posterior probability, at least one fragment with the posterior probability being greater than or equal to a preset value is selected, for example, at least one fragment is P1, P2 and P3 respectively, P1, P2 and P3 are combined with a Query statement respectively to obtain an updated Query statement, for example, Query P1 is formed by P1, the updated Query statement Query P1 is used as a new input, and the posterior probability of the updated Query statement Query P1 and a first fragment in the target fragment cluster is calculated, wherein the first fragment is a fragment in the target fragment cluster except P1.
204: and repeating the operation of selecting at least one fragment from the target fragment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster.
In the embodiment of the application, according to the posterior probability obtained by the last calculation, the fragments in the target fragment cluster except for P1 are sorted, at least one fragment with the posterior probability being greater than or equal to the preset value is selected, for example, the at least one fragment is P11, P12, P13, P1, P2, P3 and the last input Query P1 form an updated Query statement, for example, the at least one fragment and P12 form a currently updated Query statement P1P 12, and the above operations are repeated until at least one currently selected fragment in each path of P1, P2, and P3 does not have a directly connected fragment in the target fragment cluster, that is, a fragment having a correlation with at least one currently selected fragment does not exist in the target fragment cluster through analysis of a clustering map.
205: and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
In the embodiment of the application, assuming that after a current updated Query statement Query P1P 12 is formed, a segment directly connected to P12 does not exist in a target segment cluster, the update input is stopped, a target posterior probability of the current updated Query statement Query P1P 12 and a second segment in the target segment cluster is calculated, where the second segment is a segment other than P12 in the target segment cluster, the second segment is ranked according to the target posterior probability, a segment with the maximum target posterior probability, such as P115, is selected, P115, P12, and P1 are used as open-domain question-answer results of the Query statement, and then the open-domain question-answer result is returned to a user. Of course, the above is taken as an example, in an actual scenario, there are also updated query statements composed of P2 and P3, and the maximum target posterior probability refers to the maximum posterior probability in all current updated query statements. According to the requirement of the correlation between the segments, the number of at least one segment selected each time can be the same or different, for example, the input is updated later, the value of the calculated posterior probability may not be high overall, and therefore, the number of at least one segment can be in a descending trend to reduce the calculation amount.
It can be seen that, in the embodiment of the present application, an input query statement is encoded by using a pre-training model, so as to obtain a query vector of the query statement; matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster; repeatedly executing the operation of selecting at least one fragment from the target fragment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the currently selected at least one fragment exists in the target fragment cluster; and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability. The method includes the steps of firstly clustering fragment data to obtain a clustering graph, when query sentences are input, selecting a target fragment cluster from at least one fragment cluster, using the target fragment cluster as a database, screening at least one fragment from a layer in the target fragment cluster, and returning a fragment with the maximum target posterior probability and a fragment related to the fragment with the maximum target posterior probability in each layer as an open domain question-answer result.
Referring to fig. 4, fig. 4 is a schematic flow chart of another open-domain question-answer prediction method based on a pre-training model according to an embodiment of the present application, as shown in fig. 4, including steps 401 and 410:
401: coding fragment data of each field by adopting a pre-training model to obtain at least one semantic vector;
402: carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points, and calculating the average distance between the points of the first target number of points;
403: repeatedly executing operations of carrying out logarithmic sampling and calculating the average distance between the points on at least one semantic vector for K times to obtain the average distance between the K points, and taking the average value of the average distance between the K points as the radius of the cluster;
404: carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points, randomly selecting one point from the second target number of points as a clustering center, and calculating the number of similar points of the clustering center according to the radius and a predefined discriminant function;
405: repeatedly performing operations of performing logarithmic sampling on at least one semantic vector for K times, randomly selecting a point as a clustering center and calculating the number of similar points of the clustering center to obtain K number values, and taking the average value of the K number values as a neighborhood density threshold value of clustering;
406: coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;
407: matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster; the at least one fragment cluster is obtained by clustering based on a radius and a neighborhood density threshold;
408: selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster;
409: repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster;
410: and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
The specific implementation of steps 401 and 410 has been described in the embodiment shown in fig. 2, and can achieve the same or similar beneficial effects, and is not repeated here for avoiding repetition.
Please refer to fig. 5 based on the description of the embodiment of the open-domain question-answer prediction method based on the pre-training model, where fig. 5 is a schematic structural diagram of an open-domain question-answer prediction apparatus based on the pre-training model according to the embodiment of the present application, and as shown in fig. 5, the apparatus includes:
the encoding unit 501 is configured to encode the input query statement by using a pre-training model to obtain a query vector of the query statement;
a matching unit 502, configured to match the query vector with at least one segment cluster in a pre-constructed clustering map, so as to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;
an updating unit 503, configured to select at least one segment from the target segment cluster, obtain an updated query statement according to the at least one segment, and calculate a posterior probability between the updated query statement and a first segment in the target segment cluster;
the updating unit 503 is further configured to repeatedly perform operations of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining a currently updated query statement according to the at least one segment until no segment directly connected to the currently selected at least one segment exists in the target segment cluster;
the predicting unit 504 is configured to calculate a target posterior probability of the current updated query statement and the second segment in the target segment cluster, and return an open domain question-answer result of the query statement according to the target posterior probability.
It can be seen that, in the open-domain question-answer prediction apparatus based on the pre-training model shown in fig. 5, the pre-training model is used to encode the input query statement, so as to obtain the query vector of the query statement; matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster; repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster; and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability. The method includes the steps of firstly clustering fragment data to obtain a clustering graph, when query sentences are input, selecting a target fragment cluster from at least one fragment cluster, using the target fragment cluster as a database, screening at least one fragment from a layer in the target fragment cluster, and returning a fragment with the maximum target posterior probability and a fragment related to the fragment with the maximum target posterior probability in each layer as an open domain question-answer result.
In a possible embodiment, at least one fragment cluster is obtained by clustering fragment data of each domain, and the encoding unit 501 is further configured to:
determining a radius and a neighborhood density threshold value adopted for clustering fragment data of each field in a clustering algorithm;
and constructing a clustering graph based on the radius and the neighborhood density threshold.
In a possible implementation manner, in determining the radius used for clustering the fragment data of each field in the clustering algorithm, the encoding unit 501 is specifically configured to:
coding fragment data of each field by adopting a pre-training model to obtain at least one semantic vector;
carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points;
calculating the average distance between the points of the first target number of points;
repeatedly performing operations of performing logarithmic sampling and calculating the average distance between the points on at least one semantic vector for K times to obtain the average distance between the K points, wherein K is an integer greater than 1;
the average of the average distances between K points was taken as the radius.
In one possible implementation, in determining a neighborhood density threshold used for clustering fragment data of each field in a clustering algorithm, the encoding unit 501 is specifically configured to:
carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points;
randomly selecting one point from the second target quantity points as a clustering center, and calculating the quantity of the same type points of the clustering center according to the radius and a predefined discriminant function;
repeatedly performing K times of logarithmic sampling on at least one semantic vector, randomly selecting a point as a clustering center and calculating the number of the similar points of the clustering center to obtain K number values;
and taking the average value of the K number of magnitude values as a neighborhood density threshold value.
In a possible implementation, in constructing the cluster map based on the radius and the neighborhood density threshold, the encoding unit 501 is specifically configured to:
starting from any semantic vector in at least one semantic vector, acquiring the number of neighborhood points of any semantic vector according to the radius, if the number of the neighborhood points is less than a neighborhood density threshold, determining any semantic vector as a boundary point, and if the number of the neighborhood points is greater than or equal to the neighborhood density threshold, determining any semantic vector as a core point;
if any semantic vector is a core point, determining a point with the reachable density of any semantic vector and the density of any semantic vector as a fragment cluster, if any semantic vector is a boundary point, adding any semantic vector into the fragment cluster to which the core point with the reachable density of any semantic vector belongs until the core point in at least one semantic vector is clustered, and obtaining at least one fragment cluster;
and endowing an edge for the neighborhood point in each fragment cluster in at least one fragment cluster to obtain a clustering map.
In a possible implementation manner, in matching the query vector with at least one segment cluster in the pre-constructed cluster map to determine a target segment cluster to which the query statement belongs from the at least one segment cluster, the matching unit 502 is specifically configured to:
calculating, for each of the at least one segment cluster, an average of the core points in each segment cluster;
taking the average value of the core points in each fragment cluster as the clustering center of each fragment cluster;
calculating the target distance between the query vector and the clustering center of each fragment cluster;
and determining the segment cluster represented by the cluster center with the minimum target distance in the at least one segment cluster as the target segment cluster.
In a possible implementation manner, in terms of encoding an input query statement by using a pre-training model to obtain a query vector of the query statement, the encoding unit 501 is specifically configured to:
preprocessing a query statement to obtain a word vector of the query statement;
calculating based on the word vectors to obtain a query matrix, a key matrix and a value matrix;
calculating to obtain attention weight based on the query matrix, the key matrix and the value matrix;
and multiplying the attention weight by the value matrix to obtain an attention vector, and encoding the attention vector to obtain a query vector.
According to an embodiment of the present application, the units of the open-domain question-answering prediction apparatus based on the pre-trained model shown in fig. 5 may be respectively or completely combined into one or several other units to form the open-domain question-answering prediction apparatus, or some unit(s) thereof may be further split into multiple functionally smaller units to form the open-domain question-answering prediction apparatus, which may implement the same operation without affecting implementation of technical effects of embodiments of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the open-domain question-answering prediction apparatus based on the pre-trained model may also include other units, and in practical applications, these functions may also be implemented by assistance of other units, and may be implemented by cooperation of multiple units.
According to another embodiment of the present application, the open-domain question-and-answer prediction apparatus device based on the pre-trained model as shown in fig. 5 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the corresponding method as shown in fig. 2 or fig. 4 on a general-purpose computing device, such as a computer, including a processing element and a storage element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and the like, and the open-domain question-and-answer prediction method based on the pre-trained model according to an embodiment of the present application may be implemented. The computer program may be recorded on a computer-readable recording medium, for example, and loaded and executed in the above-described computing apparatus via the computer-readable recording medium.
Based on the description of the method embodiment and the device embodiment, the embodiment of the application further provides an electronic device. Referring to fig. 6, the electronic device includes at least a processor 601, an input device 602, an output device 603, and a computer storage medium 604. The processor 601, input device 602, output device 603, and computer storage medium 604 within the electronic device may be connected by a bus or other means.
A computer storage medium 604 may be stored in the memory of the electronic device, the computer storage medium 604 being for storing a computer program comprising program instructions, the processor 601 being for executing the program instructions stored by the computer storage medium 604. The processor 601 (or CPU) is a computing core and a control core of the electronic device, and is adapted to implement one or more instructions, and in particular, is adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function.
In one embodiment, the processor 601 of the electronic device provided in the embodiment of the present application may be configured to perform a series of open-domain question-answering prediction processes based on a pre-trained model:
coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;
matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;
selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster;
repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster;
and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
It can be seen that, in the electronic device shown in fig. 6, the query vectors of the query sentences are obtained by encoding the input query sentences with the pre-training model; matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster; repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster; and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability. The method includes the steps of firstly clustering fragment data to obtain a clustering graph, when query sentences are input, selecting a target fragment cluster from at least one fragment cluster, using the target fragment cluster as a database, screening at least one fragment from a layer in the target fragment cluster, and returning a fragment with the maximum target posterior probability and a fragment related to the fragment with the maximum target posterior probability in each layer as an open domain question-answer result.
In another embodiment, at least one fragment cluster is obtained by clustering fragment data of each field, and before encoding an input query statement using a pre-training model to obtain a query vector of the query statement, the processor 601 is further configured to:
determining a radius and a neighborhood density threshold value adopted for clustering fragment data of each field in a clustering algorithm;
and constructing a clustering graph based on the radius and the neighborhood density threshold.
In another embodiment, the processor 601 executes a clustering algorithm to determine the radius to cluster the fragment data of each domain, including:
coding fragment data of each field by adopting a pre-training model to obtain at least one semantic vector;
carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points;
calculating the average distance between the points of the first target number of points;
repeatedly performing operations of performing logarithmic sampling and calculating the average distance between the points on at least one semantic vector for K times to obtain the average distance between the K points, wherein K is an integer greater than 1;
the average of the average distances between K points was taken as the radius.
In another embodiment, the processor 601 executes a neighborhood density threshold for determining a neighborhood density threshold for clustering fragment data of each domain in a clustering algorithm, including:
carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points;
randomly selecting one point from the second target quantity points as a clustering center, and calculating the quantity of the same type points of the clustering center according to the radius and a predefined discriminant function;
repeatedly performing K times of logarithmic sampling on at least one semantic vector, randomly selecting a point as a clustering center and calculating the number of the similar points of the clustering center to obtain K number values;
and taking the average value of the K number of magnitude values as a neighborhood density threshold value.
In yet another embodiment, processor 601 performs the constructing the cluster map based on the radius and the neighborhood density threshold, including:
starting from any semantic vector in at least one semantic vector, acquiring the number of neighborhood points of any semantic vector according to the radius, if the number of the neighborhood points is less than a neighborhood density threshold, determining any semantic vector as a boundary point, and if the number of the neighborhood points is greater than or equal to the neighborhood density threshold, determining any semantic vector as a core point;
if any semantic vector is a core point, determining a point with the reachable density of any semantic vector and the density of any semantic vector as a fragment cluster, if any semantic vector is a boundary point, adding any semantic vector into the fragment cluster to which the core point with the reachable density of any semantic vector belongs until the core point in at least one semantic vector is clustered, and obtaining at least one fragment cluster;
and endowing an edge for the neighborhood point in each fragment cluster in at least one fragment cluster to obtain a clustering map.
In another embodiment, the matching of the query vector and at least one segment cluster in the pre-constructed clustering graph by the processor 601 to determine a target segment cluster to which the query statement belongs from the at least one segment cluster includes:
calculating, for each of the at least one segment cluster, an average of the core points in each segment cluster;
taking the average value of the core points in each fragment cluster as the clustering center of each fragment cluster;
calculating the target distance between the query vector and the clustering center of each fragment cluster;
and determining the segment cluster represented by the cluster center with the minimum target distance in the at least one segment cluster as the target segment cluster.
In another embodiment, the processor 601 performs encoding on the input query statement by using the pre-training model to obtain a query vector of the query statement, including:
preprocessing a query statement to obtain a word vector of the query statement;
calculating based on the word vectors to obtain a query matrix, a key matrix and a value matrix;
calculating to obtain attention weight based on the query matrix, the key matrix and the value matrix;
and multiplying the attention weight by the value matrix to obtain an attention vector, and encoding the attention vector to obtain a query vector.
By way of example, electronic devices include, but are not limited to, a processor 601, an input device 602, an output device 603, and a computer storage medium 604. And the system also comprises a memory, a power supply, an application client module and the like. The input device 602 may be a keyboard, touch screen, radio frequency receiver, etc., and the output device 603 may be a speaker, display, radio frequency transmitter, etc. It will be appreciated by those skilled in the art that the schematic diagrams are merely examples of an electronic device and are not limiting of an electronic device and may include more or fewer components than those shown, or some components in combination, or different components.
It should be noted that, since the processor 601 of the electronic device executes the computer program to implement the steps in the open-domain question-answer prediction method based on the pre-trained model, the embodiments of the open-domain question-answer prediction method based on the pre-trained model are all applicable to the electronic device, and all can achieve the same or similar beneficial effects.
An embodiment of the present application further provides a computer storage medium (Memory), which is a Memory device in an electronic device and is used to store programs and data. It is understood that the computer storage medium herein may include a built-in storage medium in the terminal, and may also include an extended storage medium supported by the terminal. The computer storage medium provides a storage space that stores an operating system of the terminal. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 601. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; alternatively, it may be at least one computer storage medium located remotely from the processor 601. In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by processor 601 to perform the corresponding steps described above with respect to the open-domain question-and-answer prediction method based on a pre-trained model.
Illustratively, the computer program of the computer storage medium includes computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.
It should be noted that, since the computer program of the computer storage medium is executed by the processor to implement the steps in the open-domain question-answer prediction method based on the pre-trained model, all the embodiments of the open-domain question-answer prediction method based on the pre-trained model are applicable to the computer storage medium, and can achieve the same or similar beneficial effects.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Claims (10)
1. An open domain question-answer prediction method based on a pre-training model is characterized by comprising the following steps:
coding an input query statement by adopting a pre-training model to obtain a query vector of the query statement;
matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;
selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster;
repeatedly executing the operation of selecting at least one fragment from the target fragment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the currently selected at least one fragment exists in the target fragment cluster;
and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
2. The method according to claim 1, wherein the at least one fragment cluster is obtained by clustering fragment data of each field, and before encoding an input query statement using a pre-training model to obtain a query vector of the query statement, the method further comprises:
determining a radius and a neighborhood density threshold value adopted for clustering fragment data of each field in a clustering algorithm;
and constructing the clustering graph based on the radius and the neighborhood density threshold value.
3. The method of claim 2, wherein determining the radius used by the clustering algorithm to cluster the fragment data of each domain comprises:
coding the fragment data of each field by adopting the pre-training model to obtain at least one semantic vector;
carrying out logarithmic sampling on the at least one semantic vector to obtain a first target number of points;
calculating the average distance between the points of the first target number of points;
repeatedly executing operations of carrying out logarithmic sampling and calculating the average distance between the points on the at least one semantic vector for K times to obtain the average distance between the K points, wherein K is an integer larger than 1;
and taking the average value of the average distances among the K points as the radius.
4. The method of claim 3, wherein determining a neighborhood density threshold for clustering fragment data of each domain in a clustering algorithm comprises:
carrying out logarithmic sampling on the at least one semantic vector to obtain a second target number of points;
randomly selecting one point from the second target quantity points as a clustering center, and calculating the quantity of the same type points of the clustering center according to the radius and a predefined discriminant function;
repeatedly performing K times of logarithmic sampling on the at least one semantic vector, randomly selecting a point as a clustering center and calculating the number of the similar points of the clustering center to obtain K number values;
and taking the average value of the K number of magnitude values as the neighborhood density threshold value.
5. The method of claim 3, wherein constructing the clustering graph based on the radius and the neighborhood density threshold comprises:
starting from any semantic vector in the at least one semantic vector, acquiring the number of neighborhood points of the any semantic vector according to the radius, if the number of the neighborhood points is smaller than the neighborhood density threshold, determining the any semantic vector as a boundary point, and if the number of the neighborhood points is larger than or equal to the neighborhood density threshold, determining the any semantic vector as a core point;
if any semantic vector is a core point, determining a point with the reachable density of any semantic vector and the density of any semantic vector as a fragment cluster, and if any semantic vector is a boundary point, adding any semantic vector into the fragment cluster to which the core point with the reachable density of any semantic vector belongs until the core point in at least one semantic vector is clustered, so as to obtain at least one fragment cluster;
and assigning an edge to the neighborhood point in each of the at least one fragment cluster to obtain the clustering graph.
6. The method according to any one of claims 1 to 4, wherein the matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster comprises:
calculating, for each of the at least one segment cluster, an average of core points in the each segment cluster;
taking the average value of the core points in each segment cluster as the clustering center of each segment cluster;
calculating a target distance between the query vector and the clustering center of each fragment cluster;
and determining the segment cluster represented by the cluster center with the minimum target distance in the at least one segment cluster as the target segment cluster.
7. The method of claim 1, wherein the encoding the input query statement using the pre-training model to obtain the query vector of the query statement comprises:
preprocessing the query statement to obtain a word vector of the query statement;
calculating based on the word vectors to obtain a query matrix, a key matrix and a value matrix;
calculating attention weight based on the query matrix, the key matrix and the value matrix;
and multiplying the attention weight by the value matrix to obtain an attention vector, and encoding the attention vector to obtain the query vector.
8. An open-domain question-answer prediction device based on a pre-training model, the device comprising:
the encoding unit is used for encoding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;
the matching unit is used for matching the query vector with at least one segment cluster in a pre-constructed clustering map so as to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;
the updating unit is used for selecting at least one segment from the target segment cluster, obtaining an updated query statement according to the at least one segment, and calculating the posterior probability of the updated query statement and the first segment in the target segment cluster;
the updating unit is further configured to repeatedly perform an operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining a currently updated query statement according to the at least one segment until no segment directly connected with the currently selected at least one segment exists in the target segment cluster;
and the prediction unit is used for calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
9. An electronic device comprising an input device and an output device, further comprising:
a processor adapted to implement one or more instructions; and the number of the first and second groups,
a computer storage medium having one or more instructions stored thereon, the one or more instructions adapted to be loaded by the processor and to perform the method of any of claims 1-7.
10. A computer storage medium having stored thereon one or more instructions adapted to be loaded by a processor and to perform the method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111167748.7A CN113723115B (en) | 2021-09-30 | 2021-09-30 | Open domain question-answer prediction method based on pre-training model and related equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111167748.7A CN113723115B (en) | 2021-09-30 | 2021-09-30 | Open domain question-answer prediction method based on pre-training model and related equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113723115A true CN113723115A (en) | 2021-11-30 |
CN113723115B CN113723115B (en) | 2024-02-09 |
Family
ID=78685636
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111167748.7A Active CN113723115B (en) | 2021-09-30 | 2021-09-30 | Open domain question-answer prediction method based on pre-training model and related equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113723115B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115687031A (en) * | 2022-11-15 | 2023-02-03 | 北京优特捷信息技术有限公司 | Method, device, equipment and medium for generating alarm description text |
WO2023108995A1 (en) * | 2021-12-15 | 2023-06-22 | 平安科技(深圳)有限公司 | Vector similarity calculation method and apparatus, device and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140358928A1 (en) * | 2013-06-04 | 2014-12-04 | International Business Machines Corporation | Clustering Based Question Set Generation for Training and Testing of a Question and Answer System |
US20150293970A1 (en) * | 2014-04-10 | 2015-10-15 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Information searching method and device |
US20180293302A1 (en) * | 2017-04-06 | 2018-10-11 | International Business Machines Corporation | Natural question generation from query data using natural language processing system |
CN110750629A (en) * | 2019-09-18 | 2020-02-04 | 平安科技(深圳)有限公司 | Robot dialogue generation method and device, readable storage medium and robot |
CN112487173A (en) * | 2020-12-18 | 2021-03-12 | 北京百度网讯科技有限公司 | Man-machine conversation method, device and storage medium |
KR20210051523A (en) * | 2019-10-30 | 2021-05-10 | 주식회사 솔트룩스 | Dialogue system by automatic domain classfication |
CN113139042A (en) * | 2021-04-25 | 2021-07-20 | 内蒙古工业大学 | Emotion controllable reply generation method using fine-tuning and reordering strategy |
WO2021169842A1 (en) * | 2020-02-24 | 2021-09-02 | 京东方科技集团股份有限公司 | Method and apparatus for updating data, electronic device, and computer readable storage medium |
-
2021
- 2021-09-30 CN CN202111167748.7A patent/CN113723115B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140358928A1 (en) * | 2013-06-04 | 2014-12-04 | International Business Machines Corporation | Clustering Based Question Set Generation for Training and Testing of a Question and Answer System |
US20150293970A1 (en) * | 2014-04-10 | 2015-10-15 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Information searching method and device |
US20180293302A1 (en) * | 2017-04-06 | 2018-10-11 | International Business Machines Corporation | Natural question generation from query data using natural language processing system |
CN110750629A (en) * | 2019-09-18 | 2020-02-04 | 平安科技(深圳)有限公司 | Robot dialogue generation method and device, readable storage medium and robot |
KR20210051523A (en) * | 2019-10-30 | 2021-05-10 | 주식회사 솔트룩스 | Dialogue system by automatic domain classfication |
WO2021169842A1 (en) * | 2020-02-24 | 2021-09-02 | 京东方科技集团股份有限公司 | Method and apparatus for updating data, electronic device, and computer readable storage medium |
CN112487173A (en) * | 2020-12-18 | 2021-03-12 | 北京百度网讯科技有限公司 | Man-machine conversation method, device and storage medium |
CN113139042A (en) * | 2021-04-25 | 2021-07-20 | 内蒙古工业大学 | Emotion controllable reply generation method using fine-tuning and reordering strategy |
Non-Patent Citations (1)
Title |
---|
TIANJIAO GUO: "Course Question Answering System Based on Artificial Intelligence", APPLICATION OF INTELLIGENT SYSTEMS IN MULTI-MODAL INFORMATION ANALYTICS. 2021 INTERNATIONAL CONFERENCE ON MULTI-MODAL INFORMATION ANALYTICS (MMIA 2021). ADVANCES IN INTELLIGENT SYSTEMS AND COMPUTING, vol. 2, pages 723 - 730 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023108995A1 (en) * | 2021-12-15 | 2023-06-22 | 平安科技(深圳)有限公司 | Vector similarity calculation method and apparatus, device and storage medium |
CN115687031A (en) * | 2022-11-15 | 2023-02-03 | 北京优特捷信息技术有限公司 | Method, device, equipment and medium for generating alarm description text |
Also Published As
Publication number | Publication date |
---|---|
CN113723115B (en) | 2024-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110766142A (en) | Model generation method and device | |
US9852177B1 (en) | System and method for generating automated response to an input query received from a user in a human-machine interaction environment | |
CN113723115A (en) | Open domain question-answer prediction method based on pre-training model and related equipment | |
CN113268609A (en) | Dialog content recommendation method, device, equipment and medium based on knowledge graph | |
CN114691828A (en) | Data processing method, device, equipment and medium | |
CN111507108B (en) | Alias generation method and device, electronic equipment and computer readable storage medium | |
CN114358023A (en) | Intelligent question-answer recall method and device, computer equipment and storage medium | |
CN110489730A (en) | Text handling method, device, terminal and storage medium | |
CN109474516B (en) | Method and system for recommending instant messaging connection strategy based on convolutional neural network | |
CN116957128A (en) | Service index prediction method, device, equipment and storage medium | |
Liu et al. | Beyond top‐n accuracy indicator: a comprehensive evaluation indicator of cnn models in image classification | |
CN114880991A (en) | Knowledge map question-answer entity linking method, device, equipment and medium | |
CN117795527A (en) | Evaluation of output sequences using autoregressive language model neural networks | |
CN114268625B (en) | Feature selection method, device, equipment and storage medium | |
CN111400413B (en) | Method and system for determining category of knowledge points in knowledge base | |
CN111324722B (en) | Method and system for training word weight model | |
CN110147881B (en) | Language processing method, device, equipment and storage medium | |
CN113449079B (en) | Text abstract generating method and device, electronic equipment and storage medium | |
CN116680390B (en) | Vocabulary association recommendation method and system | |
US11755570B2 (en) | Memory-based neural network for question answering | |
CN115146258B (en) | Request processing method and device, storage medium and electronic equipment | |
CN111897884B (en) | Data relationship information display method and terminal equipment | |
CN116992017A (en) | Abnormal body detection method, device, equipment and storage medium | |
CN116414963A (en) | Method, device and storage medium for inquiring reply content | |
CN117827887A (en) | Recall method, system, electronic device and storage medium for complex domain dense channel |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |