CN113723115B - Open domain question-answer prediction method based on pre-training model and related equipment - Google Patents

Open domain question-answer prediction method based on pre-training model and related equipment Download PDF

Info

Publication number
CN113723115B
CN113723115B CN202111167748.7A CN202111167748A CN113723115B CN 113723115 B CN113723115 B CN 113723115B CN 202111167748 A CN202111167748 A CN 202111167748A CN 113723115 B CN113723115 B CN 113723115B
Authority
CN
China
Prior art keywords
cluster
fragment
segment
target
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111167748.7A
Other languages
Chinese (zh)
Other versions
CN113723115A (en
Inventor
成杰峰
彭奕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202111167748.7A priority Critical patent/CN113723115B/en
Publication of CN113723115A publication Critical patent/CN113723115A/en
Application granted granted Critical
Publication of CN113723115B publication Critical patent/CN113723115B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of artificial intelligence, and particularly provides an open domain question-answer prediction method and related equipment based on a pre-training model, wherein the method comprises the following steps: encoding the query statement to obtain a query vector; matching the query vector with at least one fragment cluster to determine a target fragment cluster to which the query statement belongs; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating posterior probability of the updated query statement and the fragments in the target fragment cluster; repeatedly executing the operations of selecting at least one fragment according to the posterior probability and obtaining updated query sentences according to the at least one fragment until no fragment directly connected with the currently selected at least one fragment exists in the target fragment cluster; and calculating posterior probability of the latest query statement and fragments in the target fragment cluster, and returning a question-answer result according to the posterior probability. The embodiment of the application is beneficial to improving the prediction efficiency in open domain questions and answers.

Description

Open domain question-answer prediction method based on pre-training model and related equipment
Technical Field
The application relates to the technical field of intelligent questions and answers, in particular to an open domain question and answer prediction method based on a pre-training model and related equipment.
Background
With the development of the internet, the business volume of each industry is rapidly increased, and the customer scale gradually shifts from offline to online, and the number of artificial customer services and the processing efficiency of each enterprise are far from the acceleration of online customers, so that various intelligent question-answering systems are urgently needed to alleviate the phenomenon. The existing intelligent question system is mostly based on a closed domain, namely, a knowledge base of questions and answers is limited in a specific domain, such as banks, insurance, questions and answers, and under the drive of customer demands, researchers propose an open-domain question and answer technology (open-domain QA), which is not limited to questions and answers in a certain domain, but learns knowledge based on massive text documents (such as knowledge bases of wikipedia and the like) in various industries, so that questions in any domain can be answered. In the current open domain question-answering system, posterior probability is calculated on query sentences and massive fragments one by one, and fragments with high probability are extracted.
Disclosure of Invention
Aiming at the problems, the application provides an open domain question-answering prediction method and related equipment based on a pre-training model, which are beneficial to improving the prediction efficiency in the open domain question-answering.
To achieve the above object, a first aspect of an embodiment of the present application provides an open domain question-answer prediction method based on a pre-training model, where the method includes:
encoding an input query sentence by adopting a pre-training model to obtain a query vector of the query sentence;
matching the query vector with at least one fragment cluster in a pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster;
selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster;
repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster;
And calculating the target posterior probability of the second segment in the current updated query sentence and target segment cluster, and returning an open domain question-answering result of the query sentence according to the target posterior probability.
With reference to the first aspect, in one possible implementation manner, at least one fragment cluster is obtained by clustering fragment data in each field, and before the input query sentence is encoded by adopting the pre-training model, the method further includes:
determining a radius and a neighborhood density threshold value adopted by clustering the segment data in each field in a clustering algorithm;
and constructing a cluster map based on the radius and the neighborhood density threshold.
With reference to the first aspect, in one possible implementation manner, determining a radius adopted by clustering segment data of each domain in a clustering algorithm includes:
the segment data in each field is encoded by adopting a pre-training model, so as to obtain at least one semantic vector;
carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points;
calculating the average distance between the points of the first target number of points;
repeatedly executing K times to perform logarithmic sampling on at least one semantic vector and calculating the average distance between points to obtain K average distances between points, wherein K is an integer greater than 1;
The average value of the average distance between K points is taken as the radius.
With reference to the first aspect, in one possible implementation manner, determining a neighborhood density threshold value adopted for clustering segment data of each domain in a clustering algorithm includes:
carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points;
randomly selecting one point from the second target number of points as a clustering center, and calculating the number of similar points of the clustering center according to the radius and a predefined discriminant function;
repeatedly executing operations of carrying out logarithmic sampling on at least one semantic vector for K times, randomly selecting one point as a clustering center and calculating the number of similar points of the clustering center to obtain K number values;
the average value of the K number values is taken as a neighborhood density threshold value.
With reference to the first aspect, in a possible implementation manner, constructing the cluster map based on the radius and the neighborhood density threshold includes:
starting from any semantic vector in at least one semantic vector, acquiring the number of neighborhood points of the any semantic vector according to the radius, determining the any semantic vector as a boundary point if the number of the neighborhood points is smaller than a neighborhood density threshold value, and determining the any semantic vector as a core point if the number of the neighborhood points is larger than or equal to the neighborhood density threshold value;
If any semantic vector is a core point, determining a point with reachable any semantic vector density and any semantic vector density as a fragment cluster, and if any semantic vector is a boundary point, adding any semantic vector into the fragment cluster to which the core point with reachable any semantic vector density belongs until the core point in at least one semantic vector is clustered to obtain at least one fragment cluster;
and giving an edge to the neighborhood point in each segment cluster in at least one segment cluster to obtain a cluster map.
With reference to the first aspect, in one possible implementation manner, matching the query vector with at least one segment cluster in the pre-constructed cluster map to determine a target segment cluster to which the query statement belongs from the at least one segment cluster includes:
calculating an average value of core points in each segment cluster for each segment cluster of the at least one segment cluster;
taking the average value of core points in each segment cluster as the clustering center of each segment cluster;
calculating the target distance between the query vector and the clustering center of each fragment cluster;
and determining the fragment cluster represented by the cluster center with the smallest target distance in at least one fragment cluster as a target fragment cluster.
With reference to the first aspect, in one possible implementation manner, the encoding the input query statement using the pre-training model to obtain a query vector of the query statement includes:
preprocessing the query sentence to obtain a word vector of the query sentence;
obtaining a query matrix, a key matrix and a value matrix based on word vector calculation;
calculating attention weights based on the query matrix, the key matrix and the value matrix;
the attention weight is multiplied by the value matrix to obtain an attention vector, and the attention vector is encoded to obtain a query vector.
A second aspect of the embodiments of the present application provides an open domain question-answer prediction apparatus based on a pre-training model, where the apparatus includes:
the coding unit is used for coding the input query statement by adopting the pre-training model to obtain a query vector of the query statement;
the matching unit is used for matching the query vector with at least one fragment cluster in the pre-constructed cluster map so as to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster;
the updating unit is used for selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster;
The updating unit is further used for repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster;
the prediction unit is used for calculating the target posterior probability of the second segment in the current updated query statement and the target segment cluster, and returning an open domain question-answer result of the query statement according to the target posterior probability.
A third aspect of the embodiments of the present application provides an electronic device, including an input device and an output device, and further including a processor adapted to implement one or more instructions; and a computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the steps of:
encoding an input query sentence by adopting a pre-training model to obtain a query vector of the query sentence;
matching the query vector with at least one fragment cluster in a pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster;
Selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster;
repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster;
and calculating the target posterior probability of the second segment in the current updated query sentence and target segment cluster, and returning an open domain question-answering result of the query sentence according to the target posterior probability.
A fourth aspect of the present embodiments provides a computer storage medium storing one or more instructions adapted to be loaded by a processor and to perform the steps of:
encoding an input query sentence by adopting a pre-training model to obtain a query vector of the query sentence;
matching the query vector with at least one fragment cluster in a pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster;
Selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster;
repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster;
and calculating the target posterior probability of the second segment in the current updated query sentence and target segment cluster, and returning an open domain question-answering result of the query sentence according to the target posterior probability.
The scheme of the application at least comprises the following beneficial effects:
in the embodiment of the application, the input query statement is encoded by adopting a pre-training model to obtain the query vector of the query statement; matching the query vector with at least one fragment cluster in a pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster; repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster; and calculating the target posterior probability of the second segment in the current updated query sentence and target segment cluster, and returning an open domain question-answering result of the query sentence according to the target posterior probability. When a query statement is input, a target fragment cluster is selected from at least one fragment cluster, the target fragment cluster is used as a database, at least one fragment is screened from layers in the target fragment cluster, and the fragment with the maximum target posterior probability and the fragment which is related to the fragment with the maximum target posterior probability in each layer are returned as open domain question-answering results.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of an application environment provided in an embodiment of the present application;
fig. 2 is a flow chart of an open domain question-answer prediction method based on a pre-training model according to an embodiment of the present application;
fig. 3 is a schematic diagram of generating a cluster map according to an embodiment of the present application;
FIG. 4 is a flowchart of another open domain question-answer prediction method based on a pre-training model according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an open domain question-answer prediction device based on a pre-training model according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
The terms "comprising" and "having" and any variations thereof, as used in the specification, claims and drawings, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. Furthermore, the terms "first," "second," and "third," etc. are used for distinguishing between different objects and not for describing a particular sequential order.
The embodiment of the application provides an open domain question-answer prediction method based on a pre-training model, which can be implemented based on an application environment shown in fig. 1, please refer to fig. 1, wherein the application environment comprises an electronic device and a user device connected with the electronic device through a network. The user equipment is provided with an input interface for receiving query sentences input by the user, such as the query sentences of the user on the commodity details, and is also provided with a communication interface for transmitting the query sentences to the electronic equipment. The electronic equipment receives the query statement through the communication interface of the electronic equipment and transmits the query statement to the processor so that the processor executes the open domain question-answer prediction method based on the pre-training model. Because the electronic equipment reduces the query range to the target fragment cluster without querying in each fragment cluster, the query calculation amount is greatly reduced, and the prediction efficiency in open domain question and answer is improved.
The electronic device may be a stand-alone server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms. Any of the at least one terminal may be a smart phone, a computer, a wearable device, an in-vehicle device, etc.
Based on the application environment shown in fig. 1, the open domain question-answer prediction method based on the pre-training model provided in the embodiment of the present application is described in detail below in conjunction with other drawings.
Referring to fig. 2, fig. 2 is a flowchart of an open domain question-answer prediction method based on a pre-training model according to an embodiment of the present application, where the method is applied to an electronic device, as shown in fig. 2, and includes steps 201-205:
201: and encoding the input query statement by adopting a pre-training model to obtain a query vector of the query statement.
In a specific embodiment of the present application, the pre-training model may use a BERT (Bidirectional Encoder Representations from Transformers, bi-directional coding representation based on convertors) model, where the BERT model uses data of each neighborhood in advance for training and fine tuning, so that the model can learn deep information of the data of each neighborhood. Illustratively, the pre-training model is adopted to encode the input query sentence to obtain the query vector of the query sentence, which comprises the following steps:
Preprocessing the query sentence to obtain a word vector of the query sentence;
obtaining a query matrix, a key matrix and a value matrix based on word vector calculation;
calculating attention weights based on the query matrix, the key matrix and the value matrix;
the attention weight is multiplied by the value matrix to obtain an attention vector, and the attention vector is encoded to obtain a query vector.
It should be appreciated that the BERT model is encoded by a transform encoder, the underlying encoder first pre-processes the input Query statement (Query) to obtain corresponding word vectors, such as word embedding or single-hot encoding, and the like, the self-attention layer of the transform encoder constructs corresponding Query vectors q, key vectors k, and value vectors v based on the word vectors, and the Query vectors q, key vectors k, and value vectors v are respectively combined with the pre-trained Query weight matrixKey weight matrix->Sum weight matrix->Multiplying to obtain a query matrix Q, a key matrix K and a value matrix V, and then calculating attention weight based on the query matrix, the key matrix and the value matrix>Finally, attention weight +.>Multiplying the value matrix V to obtain an output attention vector of the self-attention layer, and encoding the attention vector through a feedforward neural network to obtain a query vector.
202: matching the query vector with at least one fragment cluster in the pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster.
In a specific embodiment of the present application, at least one fragment cluster is obtained by clustering fragment data in each field, and before an input query sentence is encoded by adopting a pre-training model, the method further includes:
determining a radius and a neighborhood density threshold value adopted by clustering the segment data in each field in a clustering algorithm;
and constructing a cluster map based on the radius and the neighborhood density threshold.
Illustratively, determining a radius used for clustering segment data of each domain in a clustering algorithm includes:
the segment data in each field is encoded by adopting a pre-training model, so as to obtain at least one semantic vector;
carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points;
calculating the average distance between the points of the first target number of points;
repeatedly executing K times to perform logarithmic sampling on at least one semantic vector and calculating the average distance between points to obtain K average distances between points, wherein K is an integer greater than 1;
The average value of the average distance between K points is taken as the radius.
In particular, it is assumed that the number of at least one semantic vector isThe target number obtained by each logarithmic sampling isI.e. the first target number is +.>For the->Any two of the dots +.>And->Using Euclidean distance as its distance measure, the +.>The average distance between the points of the points is expressed as follows:
wherein,representation->Average distance between points of individual points +.>Representation dot->And (4) point->Is a euclidean distance of (c).
For at least one semantic vector, in order to avoid sampling imbalance, repeatedly performing operations of logarithmic sampling on at least one semantic vector and calculating average distances among points for K times to obtain K average distances among points, and calculating an average value of the K average distances among points as a radius Eps of a neighborhood, wherein the formula is as follows:
wherein,,/>,…,/>the average distance between the 1 st points, the average distance between the 2 nd points, …, the average distance between the K-th points, and Eps represent the final neighbor radius. In this embodiment, sampling is selectedThe number of points is that the number of samples in the open domain scene is very large, usually tens of millions or more, and if all the distance calculation is performed, the calculation amount is very large ( >Magnitude), therefore, taking the logarithmic sample can significantly reduce the amount of computation, and at the same time, taking the logarithmic sample K times can solve the problem of sampling imbalance.
Illustratively, determining a neighborhood density threshold value used for clustering segment data of each domain in a clustering algorithm includes:
carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points;
randomly selecting one point from the second target number of points as a clustering center, and calculating the number of similar points of the clustering center according to the radius and a predefined discriminant function;
repeatedly executing operations of carrying out logarithmic sampling on at least one semantic vector for K times, randomly selecting one point as a clustering center and calculating the number of similar points of the clustering center to obtain K number values;
the average value of the K number values is taken as a neighborhood density threshold value.
Specifically, in order to reduce the calculation amount, when determining the neighborhood density threshold value, at least one semantic vector is also subjected to logarithmic sampling to obtainA point, i.e. the second target number is +.>Then from this->One point is randomly selected from the points>As a cluster center, the number of points belonging to the same category as the cluster center is then calculated based on the previously determined radius parameter Eps and a predefined discriminant function defined as:
Wherein,representing the ratio of the two-point spacing to Eps, the discriminant function +.>The representation is: for a point, if the distance between the point nearby and the point is smaller than Eps, the point is the same kind of point. The number of its homologous pointsThe calculation formula of (2) is as follows:
wherein,representing the cluster center in a single computation->The number of homologous points of ∈ ->Representation dot->And cluster center->Is a euclidean distance of (c). Similar to the radius parameter Eps, in order to avoid the problem of unbalanced sampling, operations of logarithmic sampling, selecting a cluster center and calculating the number of similar points of the cluster center are repeated for K times to obtain K number values, and calculating an average value of the K number values as a neighborhood density threshold value Minpts, wherein the formula is as follows:
wherein,,/>,…,/>the 1 st number value, the 2 nd number value, …, the K number value are respectively represented. In this embodiment, similar to the radius parameter Eps, the neighborhood density threshold mints is determined in an adaptive manner, and in the clustering, the radius Eps and the neighborhood density threshold mints need to be determined in advanceThe values tend to bring distinct clustering results according to different selections, so that the accuracy of the final returned result is affected, therefore, the selection of parameters Eps and Minpts is very important, if the parameters Eps and Minpts are selected by using a rule of thumb, larger instability is often caused, and the stability of the clustering clusters can be increased by adopting a self-adaptive parameter selection method according to the segment data prepared in advance, so that the fluctuation of the question-answer result is obviously reduced.
Illustratively, constructing the cluster map based on the radius and the neighborhood density threshold includes:
starting from any semantic vector in at least one semantic vector, acquiring the number of neighborhood points of the any semantic vector according to the radius, determining the any semantic vector as a boundary point if the number of the neighborhood points is smaller than a neighborhood density threshold value, and determining the any semantic vector as a core point if the number of the neighborhood points is larger than or equal to the neighborhood density threshold value;
if any semantic vector is a core point, determining a point with reachable any semantic vector density and any semantic vector density as a fragment cluster, and if any semantic vector is a boundary point, adding any semantic vector into the fragment cluster to which the core point with reachable any semantic vector density belongs until the core point in at least one semantic vector is clustered to obtain at least one fragment cluster;
and giving an edge to the neighborhood point in each segment cluster in at least one segment cluster to obtain a cluster map.
Specifically, each piece of fragment data corresponds to a semantic vector, the semantic vector is expressed as each point in a high-dimensional space, any semantic vector is expressed as a point p, the number of neighborhood points of the point p is determined according to a preset radius Eps, if the number of the neighborhood points of the point p is smaller than a neighborhood density threshold value mps, the point p is a boundary point, if the number of the neighborhood points of the point p is greater than or equal to the neighborhood density threshold value mps, the point p is a core point, as shown in fig. 3, the neighborhood density threshold value mps is assumed to be 3, 3 points exist in the neighborhood of the point p, the point p is a core point, only two points exist in the neighborhood of the point q, and the point q is a boundary point. If the point p is a core point, a segment cluster can be determined, points with reachable point p density belong to the segment cluster, if the point p is a boundary point, the point p can be divided into segment clusters with reachable core points, the segment clusters with all core points belong to are determined according to the method, at least one segment cluster is obtained, for each segment cluster in at least one segment cluster, an edge is assigned to a neighborhood point in each segment cluster, for example, in fig. 3, the point p is in the neighborhood of the point q, an edge is assigned to the point p and the point q, the point p is in the neighborhood of the point s, an edge is assigned to the point p and the point s, a graph corresponding to each segment cluster is obtained, all graphs form the cluster graph, and the cluster graph is stored for subsequent matching.
Illustratively, matching the query vector with at least one fragment cluster in the pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster, including:
calculating an average value of core points in each segment cluster for each segment cluster of the at least one segment cluster;
taking the average value of core points in each segment cluster as the clustering center of each segment cluster;
calculating the target distance between the query vector and the clustering center of each fragment cluster;
and determining the fragment cluster represented by the cluster center with the smallest target distance in at least one fragment cluster as a target fragment cluster.
And if the target distance between the query vector and the clustering center is the smallest, namely the closest distance between the clustering center and the query vector is indicated, the query sentence belongs to the category represented by the clustering center.
203: at least one segment is selected from the target segment cluster, an updated query statement is obtained according to the at least one segment, and the posterior probability of the updated query statement and the first segment in the target segment cluster is calculated.
In a specific embodiment of the present application, a target segment cluster is used for querying a database, a posterior probability of each segment in the Query vector and the target segment cluster is calculated, segments in the target segment cluster are ordered according to the posterior probability, at least one segment with a posterior probability greater than or equal to a preset value is selected, for example, at least one segment is P1, P2 and P3, respectively, P1, P2 and P3 are combined with Query sentences to obtain updated Query sentences, for example, query sentences P1 are formed with P1, the updated Query sentences P1 are used as new inputs, and the posterior probability of the updated Query sentences P1 and first segments in the target segment cluster is calculated, wherein the first segments refer to segments except P1 in the target segment cluster.
204: and repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster.
In this embodiment of the present application, according to the posterior probability obtained by the previous calculation, the segments except for P1 in the target segment cluster are ordered, at least one segment with the posterior probability greater than or equal to a preset value is selected, for example, at least one segment is P11, P12, and P13, respectively, P11, P12, and P13 and the last input Query P1 form an updated Query statement, for example, a current updated Query statement Query P1P 12 is formed with P12, and the above operations are repeated until at least one segment currently selected in each path of P11, P12, and P13 does not have a segment directly connected to the target segment cluster, that is, by analysis of a cluster map, the target segment cluster does not have a segment having a correlation with the at least one currently selected segment.
205: and calculating the target posterior probability of the second segment in the current updated query sentence and target segment cluster, and returning an open domain question-answering result of the query sentence according to the target posterior probability.
In a specific embodiment of the present application, assuming that after a current updated Query statement Query P1P 12 is formed, a segment directly connected to P12 does not exist in a target segment cluster, updating input is stopped, and a target posterior probability of a second segment in the current updated Query statement Query P1P 12 and the target segment cluster is calculated, where the second segment refers to a segment in the target segment cluster except for P12, the second segment is ordered according to the target posterior probability, a segment with the maximum target posterior probability, such as P115, is selected, and P115, P12 and P1 are used as an open domain question-answer result of the Query statement, and then the open domain question-answer result is returned to a user. Of course, the above is an example, in the actual scenario, there is also an updated query sentence composed of P2 and P3, and the target posterior probability is the largest posterior probability in all the current updated query sentences. The number of at least one segment selected at a time may be the same or different according to the requirement of the correlation between segments, for example, the value of the calculated posterior probability may not be high as a whole when the input is updated backward, so the number of at least one segment may be in a decreasing trend to reduce the calculation amount.
It can be seen that in the embodiment of the present application, by encoding an input query sentence by using a pre-training model, a query vector of the query sentence is obtained; matching the query vector with at least one fragment cluster in a pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster; selecting at least one segment from the target segment cluster, obtaining an updated query statement according to the at least one segment, and calculating the posterior probability of the updated query statement and the first segment in the target segment cluster; repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining a current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster; and calculating the target posterior probability of the second segment in the current updated query sentence and the target segment cluster, and returning an open domain question-answer result of the query sentence according to the target posterior probability. When a query statement is input, a target fragment cluster is selected from at least one fragment cluster, the target fragment cluster is used as a database, at least one fragment is screened from layers in the target fragment cluster, and the fragment with the maximum target posterior probability and the fragment which is related to the fragment with the maximum target posterior probability in each layer are returned as open domain question-answering results.
Referring to fig. 4, a flowchart of another open domain question-answer prediction method based on a pre-training model provided in the embodiment of the present application in fig. 4 is shown in fig. 4, and includes steps 401-410:
401: the segment data in each field is encoded by adopting a pre-training model, so as to obtain at least one semantic vector;
402: carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points, and calculating the average distance between the points of the first target number of points;
403: repeatedly executing the operations of carrying out logarithmic sampling on at least one semantic vector and calculating the average distance between the points for K times to obtain K average distances between the points, and taking the average value of the K average distances between the points as the radius of the cluster;
404: carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points, randomly selecting one point from the second target number of points as a clustering center, and calculating the number of similar points of the clustering center according to the radius and a predefined discriminant function;
405: repeatedly executing operations of carrying out logarithmic sampling on at least one semantic vector for K times, randomly selecting a point as a clustering center and calculating the number of similar points of the clustering center to obtain K number values, and taking the average value of the K number values as a neighborhood density threshold value of the clustering;
406: encoding an input query sentence by adopting a pre-training model to obtain a query vector of the query sentence;
407: matching the query vector with at least one fragment cluster in a pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster; the at least one fragment cluster is obtained by clustering based on a radius and a neighborhood density threshold;
408: selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster;
409: repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster;
410: and calculating the target posterior probability of the second segment in the current updated query sentence and target segment cluster, and returning an open domain question-answering result of the query sentence according to the target posterior probability.
The specific implementation of steps 401-410 is described in the embodiment shown in fig. 2, and the same or similar advantages can be achieved, so that repetition is avoided and detailed description is omitted here.
For a description of the embodiment of the open domain question-answer prediction method based on the pre-training model, please refer to fig. 5, fig. 5 is a schematic structural diagram of an open domain question-answer prediction device based on the pre-training model provided in the embodiment of the present application, as shown in fig. 5, the device includes:
an encoding unit 501, configured to encode an input query sentence by using a pre-training model, to obtain a query vector of the query sentence;
the matching unit 502 is configured to match the query vector with at least one segment cluster in the pre-constructed cluster map, so as to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;
an updating unit 503, configured to select at least one segment from the target segment cluster, obtain an updated query statement according to the at least one segment, and calculate a posterior probability of the updated query statement and a first segment in the target segment cluster;
the updating unit 503 is further configured to repeatedly perform an operation of selecting at least one segment from the target segment cluster according to the posterior probability obtained last time, and obtaining a current updated query statement according to the at least one segment, until no segment directly connected to the at least one currently selected segment exists in the target segment cluster;
And a prediction unit 504, configured to calculate a target posterior probability of the second segment in the current updated query sentence and the target segment cluster, and return an open domain question-answer result of the query sentence according to the target posterior probability.
It can be seen that, in the open domain question-answer prediction device based on the pre-training model shown in fig. 5, the pre-training model is adopted to encode the input query sentence, so as to obtain the query vector of the query sentence; matching the query vector with at least one fragment cluster in a pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster; repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster; and calculating the target posterior probability of the second segment in the current updated query sentence and target segment cluster, and returning an open domain question-answering result of the query sentence according to the target posterior probability. When a query statement is input, a target fragment cluster is selected from at least one fragment cluster, the target fragment cluster is used as a database, at least one fragment is screened from layers in the target fragment cluster, and the fragment with the maximum target posterior probability and the fragment which is related to the fragment with the maximum target posterior probability in each layer are returned as open domain question-answering results.
In a possible implementation manner, at least one fragment cluster is obtained by clustering fragment data in each field, and the encoding unit 501 is further configured to:
determining a radius and a neighborhood density threshold value adopted by clustering the segment data in each field in a clustering algorithm;
and constructing a cluster map based on the radius and the neighborhood density threshold.
In one possible implementation, the encoding unit 501 is specifically configured to, in determining a radius used for clustering segment data of each domain in the clustering algorithm:
the segment data in each field is encoded by adopting a pre-training model, so as to obtain at least one semantic vector;
carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points;
calculating the average distance between the points of the first target number of points;
repeatedly executing K times to perform logarithmic sampling on at least one semantic vector and calculating the average distance between points to obtain K average distances between points, wherein K is an integer greater than 1;
the average value of the average distance between K points is taken as the radius.
In one possible implementation, in determining a neighborhood density threshold value used for clustering segment data of each domain in the clustering algorithm, the encoding unit 501 is specifically configured to:
Carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points;
randomly selecting one point from the second target number of points as a clustering center, and calculating the number of similar points of the clustering center according to the radius and a predefined discriminant function;
repeatedly executing operations of carrying out logarithmic sampling on at least one semantic vector for K times, randomly selecting one point as a clustering center and calculating the number of similar points of the clustering center to obtain K number values;
the average value of the K number values is taken as a neighborhood density threshold value.
In a possible implementation, the coding unit 501 is specifically configured to, in constructing the cluster map based on a radius and a neighborhood density threshold:
starting from any semantic vector in at least one semantic vector, acquiring the number of neighborhood points of the any semantic vector according to the radius, determining the any semantic vector as a boundary point if the number of the neighborhood points is smaller than a neighborhood density threshold value, and determining the any semantic vector as a core point if the number of the neighborhood points is larger than or equal to the neighborhood density threshold value;
if any semantic vector is a core point, determining a point with reachable any semantic vector density and any semantic vector density as a fragment cluster, and if any semantic vector is a boundary point, adding any semantic vector into the fragment cluster to which the core point with reachable any semantic vector density belongs until the core point in at least one semantic vector is clustered to obtain at least one fragment cluster;
And giving an edge to the neighborhood point in each segment cluster in at least one segment cluster to obtain a cluster map.
In one possible implementation manner, in matching the query vector with at least one segment cluster in the pre-constructed cluster map to determine, from the at least one segment cluster, a target segment cluster to which the query statement belongs, the matching unit 502 is specifically configured to:
calculating an average value of core points in each segment cluster for each segment cluster of the at least one segment cluster;
taking the average value of core points in each segment cluster as the clustering center of each segment cluster;
calculating the target distance between the query vector and the clustering center of each fragment cluster;
and determining the fragment cluster represented by the cluster center with the smallest target distance in at least one fragment cluster as a target fragment cluster.
In one possible implementation, in encoding an input query term using a pre-training model, the encoding unit 501 is specifically configured to:
preprocessing the query sentence to obtain a word vector of the query sentence;
obtaining a query matrix, a key matrix and a value matrix based on word vector calculation;
calculating attention weights based on the query matrix, the key matrix and the value matrix;
The attention weight is multiplied by the value matrix to obtain an attention vector, and the attention vector is encoded to obtain a query vector.
According to one embodiment of the present application, each unit of the open domain question-answer prediction apparatus based on the pre-training model shown in fig. 5 may be separately or completely combined into one or several additional units, or some unit(s) thereof may be further split into a plurality of units with smaller functions to form the same operation, which may not affect the implementation of the technical effects of the embodiments of the present application. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present application, the open-domain question-answer prediction apparatus based on the pre-training model may also include other units, and in practical applications, these functions may also be implemented with assistance of other units, and may be implemented by cooperation of multiple units.
According to another embodiment of the present application, the open-domain question-answer prediction apparatus device based on a pre-training model as shown in fig. 5 may be constructed by running a computer program (including a program code) capable of executing the steps involved in the respective methods as shown in fig. 2 or fig. 4 on a general-purpose computing device such as a computer including a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), etc., processing elements and storage elements, and implementing the open-domain question-answer prediction method based on a pre-training model of the embodiments of the present application. The computer program may be recorded on, for example, a computer-readable recording medium, and loaded into and executed by the above-described computing device via the computer-readable recording medium.
Based on the description of the method embodiment and the device embodiment, the embodiment of the application also provides electronic equipment. Referring to fig. 6, the electronic device includes at least a processor 601, an input device 602, an output device 603, and a computer storage medium 604. Wherein the processor 601, input device 602, output device 603, and computer storage medium 604 within the electronic device may be connected by a bus or other means.
The computer storage medium 604 may be stored in a memory of an electronic device, the computer storage medium 604 being for storing a computer program comprising program instructions, the processor 601 being for executing the program instructions stored by the computer storage medium 604. The processor 601 (or CPU (Central Processing Unit, central processing unit)) is a computing core as well as a control core of the electronic device, which is adapted to implement one or more instructions, in particular to load and execute one or more instructions to implement a corresponding method flow or a corresponding function.
In one embodiment, the processor 601 of the electronic device provided in the embodiments of the present application may be configured to perform a series of open-domain question-answer prediction processes based on a pre-trained model:
Encoding an input query sentence by adopting a pre-training model to obtain a query vector of the query sentence;
matching the query vector with at least one fragment cluster in a pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster;
selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster;
repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster;
and calculating the target posterior probability of the second segment in the current updated query sentence and target segment cluster, and returning an open domain question-answering result of the query sentence according to the target posterior probability.
It can be seen that, in the electronic device shown in fig. 6, the query vector of the query statement is obtained by encoding the input query statement by using the pre-training model; matching the query vector with at least one fragment cluster in a pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster; repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster; and calculating the target posterior probability of the second segment in the current updated query sentence and target segment cluster, and returning an open domain question-answering result of the query sentence according to the target posterior probability. When a query statement is input, a target fragment cluster is selected from at least one fragment cluster, the target fragment cluster is used as a database, at least one fragment is screened from layers in the target fragment cluster, and the fragment with the maximum target posterior probability and the fragment which is related to the fragment with the maximum target posterior probability in each layer are returned as open domain question-answering results.
In yet another embodiment, the at least one segment cluster is obtained by clustering segment data of each domain, and the processor 601 is further configured to perform, before encoding the input query term using the pre-training model to obtain a query vector of the query term:
determining a radius and a neighborhood density threshold value adopted by clustering the segment data in each field in a clustering algorithm;
and constructing a cluster map based on the radius and the neighborhood density threshold.
In yet another embodiment, the processor 601 performs determining a radius employed in a clustering algorithm to cluster segment data for each domain, including:
the segment data in each field is encoded by adopting a pre-training model, so as to obtain at least one semantic vector;
carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points;
calculating the average distance between the points of the first target number of points;
repeatedly executing K times to perform logarithmic sampling on at least one semantic vector and calculating the average distance between points to obtain K average distances between points, wherein K is an integer greater than 1;
the average value of the average distance between K points is taken as the radius.
In yet another embodiment, the processor 601 performs determining a neighborhood density threshold for clustering segment data for each domain in a clustering algorithm, comprising:
Carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points;
randomly selecting one point from the second target number of points as a clustering center, and calculating the number of similar points of the clustering center according to the radius and a predefined discriminant function;
repeatedly executing operations of carrying out logarithmic sampling on at least one semantic vector for K times, randomly selecting one point as a clustering center and calculating the number of similar points of the clustering center to obtain K number values;
the average value of the K number values is taken as a neighborhood density threshold value.
In yet another embodiment, processor 601 performs constructing the cluster map based on radius and neighborhood density thresholds, comprising:
starting from any semantic vector in at least one semantic vector, acquiring the number of neighborhood points of the any semantic vector according to the radius, determining the any semantic vector as a boundary point if the number of the neighborhood points is smaller than a neighborhood density threshold value, and determining the any semantic vector as a core point if the number of the neighborhood points is larger than or equal to the neighborhood density threshold value;
if any semantic vector is a core point, determining a point with reachable any semantic vector density and any semantic vector density as a fragment cluster, and if any semantic vector is a boundary point, adding any semantic vector into the fragment cluster to which the core point with reachable any semantic vector density belongs until the core point in at least one semantic vector is clustered to obtain at least one fragment cluster;
And giving an edge to the neighborhood point in each segment cluster in at least one segment cluster to obtain a cluster map.
In yet another embodiment, the processor 601 performs matching of the query vector with at least one segment cluster in the pre-constructed cluster map to determine a target segment cluster to which the query statement belongs from the at least one segment cluster, including:
calculating an average value of core points in each segment cluster for each segment cluster of the at least one segment cluster;
taking the average value of core points in each segment cluster as the clustering center of each segment cluster;
calculating the target distance between the query vector and the clustering center of each fragment cluster;
and determining the fragment cluster represented by the cluster center with the smallest target distance in at least one fragment cluster as a target fragment cluster.
In yet another embodiment, the processor 601 performs encoding of an input query statement using a pre-training model to obtain a query vector of the query statement, comprising:
preprocessing the query sentence to obtain a word vector of the query sentence;
obtaining a query matrix, a key matrix and a value matrix based on word vector calculation;
calculating attention weights based on the query matrix, the key matrix and the value matrix;
The attention weight is multiplied by the value matrix to obtain an attention vector, and the attention vector is encoded to obtain a query vector.
By way of example, electronic devices include, but are not limited to, a processor 601, an input device 602, an output device 603, and a computer storage medium 604. And may also include memory, power supplies, application client modules, and the like. The input device 602 may be a keyboard, touch screen, radio frequency receiver, etc., and the output device 603 may be a speaker, display, radio frequency transmitter, etc. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of an electronic device and is not limiting of an electronic device, and may include more or fewer components than shown, or certain components may be combined, or different components.
It should be noted that, since the steps in the above-mentioned open-domain question-answer prediction method based on the pre-training model are implemented when the processor 601 of the electronic device executes the computer program, the embodiments of the above-mentioned open-domain question-answer prediction method based on the pre-training model are applicable to the electronic device, and the same or similar beneficial effects can be achieved.
The embodiment of the application also provides a computer storage medium (Memory), which is a Memory device in the electronic device and is used for storing programs and data. It will be appreciated that the computer storage medium herein may include both a built-in storage medium in the terminal and an extended storage medium supported by the terminal. The computer storage medium provides a storage space that stores an operating system of the terminal. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor 601. The computer storage medium herein may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory; alternatively, it may be at least one computer storage medium located remotely from the processor 601. In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by processor 601 to implement the corresponding steps described above with respect to the pre-trained model-based open-domain question-answer prediction method.
The computer program of the computer storage medium may illustratively include computer program code, which may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
It should be noted that, since the steps in the open-domain question-answer prediction method based on the pre-training model are implemented when the computer program of the computer storage medium is executed by the processor, all embodiments of the open-domain question-answer prediction method based on the pre-training model are applicable to the computer storage medium, and the same or similar beneficial effects can be achieved.
The foregoing has outlined rather broadly the more detailed description of embodiments of the present application, wherein specific examples are provided herein to illustrate the principles and embodiments of the present application, the above examples being provided solely to assist in the understanding of the methods of the present application and the core ideas thereof; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (10)

1. An open domain question-answer prediction method based on a pre-training model, which is characterized by comprising the following steps:
encoding an input query sentence by adopting a pre-training model to obtain a query vector of the query sentence;
matching the query vector with at least one fragment cluster in a pre-constructed cluster map to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster;
selecting at least one segment from the target segment cluster, obtaining an updated query statement according to the at least one segment, and calculating the posterior probability of the updated query statement and the first segment in the target segment cluster; the first fragment refers to fragments except at least one selected fragment in the target fragment cluster;
the selecting at least one segment from the target segment cluster, and obtaining an updated query statement according to the at least one segment includes:
calculating posterior probability of each segment in the query vector and the target segment cluster; selecting at least one segment with a posterior probability greater than or equal to a preset value from the target segment cluster; combining the selected at least one segment with the query statement to obtain an updated query statement;
Repeatedly executing the operations of selecting at least one fragment from the target fragment cluster according to the posterior probability obtained last time and obtaining a current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster;
the repeatedly executing the operations of selecting at least one segment from the target segment cluster according to the posterior probability obtained last time and obtaining the current updated query statement according to the at least one segment comprises the following steps:
selecting at least one segment with the posterior probability greater than or equal to a preset value from the first segment according to the posterior probability obtained by the last calculation; combining the selected at least one fragment with the last updated query sentence to obtain a current updated query sentence;
calculating the target posterior probability of the second segment in the current updated query sentence and the target segment cluster, and returning an open domain question-answer result of the query sentence according to the target posterior probability; the second segment refers to the segments of the target segment cluster except for the segments currently combined with the query statement updated last time.
2. The method of claim 1, wherein the at least one fragment cluster is obtained by clustering fragment data of each domain, and wherein prior to encoding an input query term using a pre-training model to obtain a query vector for the query term, the method further comprises:
determining a radius and a neighborhood density threshold value adopted by clustering the segment data in each field in a clustering algorithm;
and constructing the cluster map based on the radius and the neighborhood density threshold.
3. The method according to claim 2, wherein determining the radius used for clustering segment data of each domain in the clustering algorithm comprises:
the pre-training model is adopted to encode fragment data in each field to obtain at least one semantic vector;
carrying out logarithmic sampling on the at least one semantic vector to obtain a first target number of points;
calculating the average distance between the points of the first target number of points;
repeatedly executing K times to perform logarithmic sampling on the at least one semantic vector and calculating the average distance between points to obtain K average distances between points, wherein K is an integer greater than 1;
And taking the average value of the average distances among the K points as the radius.
4. A method according to claim 3, wherein determining a neighborhood density threshold for clustering segment data for each domain in the clustering algorithm comprises:
carrying out logarithmic sampling on the at least one semantic vector to obtain a second target number of points;
randomly selecting one point from the second target number of points as a clustering center, and calculating the number of similar points of the clustering center according to the radius and a predefined discriminant function;
repeatedly executing operations of carrying out logarithmic sampling on the at least one semantic vector for K times, randomly selecting a point as a clustering center and calculating the number of similar points of the clustering center to obtain K number values;
and taking the average value of the K number values as the neighborhood density threshold value.
5. A method according to claim 3, wherein said constructing said cluster map based on said radius and said neighborhood density threshold comprises:
starting from any semantic vector in the at least one semantic vector, acquiring the number of neighborhood points of the any semantic vector according to the radius, determining the any semantic vector as a boundary point if the number of the neighborhood points is smaller than the neighborhood density threshold, and determining the any semantic vector as a core point if the number of the neighborhood points is larger than or equal to the neighborhood density threshold;
If any semantic vector is a core point, determining the point with reachable density of any semantic vector and the density of any semantic vector as fragment clusters, and if any semantic vector is a boundary point, adding any semantic vector into the fragment clusters to which the core point with reachable density of any semantic vector belongs until the core point in at least one semantic vector is clustered, so as to obtain at least one fragment cluster;
and giving an edge to the neighborhood point in each segment cluster in the at least one segment cluster to obtain the cluster map.
6. The method according to any one of claims 1-4, wherein said matching the query vector with at least one segment cluster in a pre-constructed cluster map to determine a target segment cluster to which the query statement belongs from the at least one segment cluster comprises:
calculating, for each of the at least one segment cluster, an average value of core points in the each segment cluster;
taking the average value of the core points in each segment cluster as the clustering center of each segment cluster;
calculating the target distance between the query vector and the clustering center of each fragment cluster;
And determining the fragment cluster represented by the cluster center with the smallest target distance in the at least one fragment cluster as the target fragment cluster.
7. The method of claim 1, wherein the encoding the input query term using the pre-training model to obtain a query vector for the query term comprises:
preprocessing the query sentence to obtain a word vector of the query sentence;
obtaining a query matrix, a key matrix and a value matrix based on the word vector calculation;
calculating attention weights based on the query matrix, the key matrix and the value matrix;
multiplying the attention weight by the value matrix to obtain an attention vector, and encoding the attention vector to obtain the query vector.
8. An open domain question-answer prediction apparatus based on a pre-training model, the apparatus comprising:
the coding unit is used for coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;
the matching unit is used for matching the query vector with at least one fragment cluster in a pre-constructed cluster map so as to determine a target fragment cluster to which the query statement belongs from the at least one fragment cluster;
The updating unit is used for selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster; the first fragment refers to fragments except at least one selected fragment in the target fragment cluster;
the updating unit is specifically configured to: calculating posterior probability of each segment in the query vector and the target segment cluster; selecting at least one segment with a posterior probability greater than or equal to a preset value from the target segment cluster; combining the selected at least one segment with the query statement to obtain an updated query statement;
the updating unit is further configured to repeatedly perform an operation of selecting at least one segment from the target segment cluster according to the posterior probability obtained last time, and obtaining a current updated query statement according to the at least one segment until no segment directly connected to the at least one currently selected segment exists in the target segment cluster;
the updating unit is specifically configured to: selecting at least one segment with the posterior probability greater than or equal to a preset value from the first segment according to the posterior probability obtained by the last calculation; combining the selected at least one fragment with the last updated query sentence to obtain a current updated query sentence;
The prediction unit is used for calculating the target posterior probability of the second segment in the current updated query statement and the target segment cluster, and returning an open domain question-answer result of the query statement according to the target posterior probability; the second segment refers to the segments of the target segment cluster except for the segments currently combined with the query statement updated last time.
9. An electronic device comprising an input device and an output device, further comprising:
a processor adapted to implement one or more instructions; the method comprises the steps of,
a computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the method of any one of claims 1-7.
10. A computer storage medium storing one or more instructions adapted to be loaded by a processor and to perform the method of any one of claims 1-7.
CN202111167748.7A 2021-09-30 2021-09-30 Open domain question-answer prediction method based on pre-training model and related equipment Active CN113723115B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111167748.7A CN113723115B (en) 2021-09-30 2021-09-30 Open domain question-answer prediction method based on pre-training model and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111167748.7A CN113723115B (en) 2021-09-30 2021-09-30 Open domain question-answer prediction method based on pre-training model and related equipment

Publications (2)

Publication Number Publication Date
CN113723115A CN113723115A (en) 2021-11-30
CN113723115B true CN113723115B (en) 2024-02-09

Family

ID=78685636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111167748.7A Active CN113723115B (en) 2021-09-30 2021-09-30 Open domain question-answer prediction method based on pre-training model and related equipment

Country Status (1)

Country Link
CN (1) CN113723115B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238329A (en) * 2021-12-15 2022-03-25 平安科技(深圳)有限公司 Vector similarity calculation method, device, equipment and storage medium
CN115687031A (en) * 2022-11-15 2023-02-03 北京优特捷信息技术有限公司 Method, device, equipment and medium for generating alarm description text

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110750629A (en) * 2019-09-18 2020-02-04 平安科技(深圳)有限公司 Robot dialogue generation method and device, readable storage medium and robot
CN112487173A (en) * 2020-12-18 2021-03-12 北京百度网讯科技有限公司 Man-machine conversation method, device and storage medium
KR20210051523A (en) * 2019-10-30 2021-05-10 주식회사 솔트룩스 Dialogue system by automatic domain classfication
CN113139042A (en) * 2021-04-25 2021-07-20 内蒙古工业大学 Emotion controllable reply generation method using fine-tuning and reordering strategy
WO2021169842A1 (en) * 2020-02-24 2021-09-02 京东方科技集团股份有限公司 Method and apparatus for updating data, electronic device, and computer readable storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9146987B2 (en) * 2013-06-04 2015-09-29 International Business Machines Corporation Clustering based question set generation for training and testing of a question and answer system
CN103914548B (en) * 2014-04-10 2018-01-09 北京百度网讯科技有限公司 Information search method and device
US10423649B2 (en) * 2017-04-06 2019-09-24 International Business Machines Corporation Natural question generation from query data using natural language processing system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110750629A (en) * 2019-09-18 2020-02-04 平安科技(深圳)有限公司 Robot dialogue generation method and device, readable storage medium and robot
KR20210051523A (en) * 2019-10-30 2021-05-10 주식회사 솔트룩스 Dialogue system by automatic domain classfication
WO2021169842A1 (en) * 2020-02-24 2021-09-02 京东方科技集团股份有限公司 Method and apparatus for updating data, electronic device, and computer readable storage medium
CN112487173A (en) * 2020-12-18 2021-03-12 北京百度网讯科技有限公司 Man-machine conversation method, device and storage medium
CN113139042A (en) * 2021-04-25 2021-07-20 内蒙古工业大学 Emotion controllable reply generation method using fine-tuning and reordering strategy

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Course Question Answering System Based on Artificial Intelligence;Tianjiao Guo;Application of Intelligent Systems in Multi-modal Information Analytics. 2021 International Conference on Multi-modal Information Analytics (MMIA 2021). Advances in Intelligent Systems and Computing;第2卷;723-730 *

Also Published As

Publication number Publication date
CN113723115A (en) 2021-11-30

Similar Documents

Publication Publication Date Title
CN113723115B (en) Open domain question-answer prediction method based on pre-training model and related equipment
CN111046187B (en) Sample knowledge graph relation learning method and system based on confrontation type attention mechanism
CN109918663A (en) A kind of semantic matching method, device and storage medium
CN111353033B (en) Method and system for training text similarity model
JP7417679B2 (en) Information extraction methods, devices, electronic devices and storage media
CN113268609A (en) Dialog content recommendation method, device, equipment and medium based on knowledge graph
CN111597401B (en) Data processing method, device, equipment and medium based on graph relation network
CN117236410B (en) Trusted electronic file large language model training and reasoning method and device
CN114358023A (en) Intelligent question-answer recall method and device, computer equipment and storage medium
CN111507108B (en) Alias generation method and device, electronic equipment and computer readable storage medium
CN116244442A (en) Text classification method and device, storage medium and electronic equipment
CN116957128A (en) Service index prediction method, device, equipment and storage medium
CN111783453B (en) Text emotion information processing method and device
CN111400413B (en) Method and system for determining category of knowledge points in knowledge base
CN113515662A (en) Similar song retrieval method, device, equipment and storage medium
CN114492669B (en) Keyword recommendation model training method, recommendation device, equipment and medium
CN117827887A (en) Recall method, system, electronic device and storage medium for complex domain dense channel
CN116680390B (en) Vocabulary association recommendation method and system
CN113449079B (en) Text abstract generating method and device, electronic equipment and storage medium
CN117093697B (en) Real-time adaptive dialogue method, device, equipment and storage medium
Kwon et al. Feature embedding and conditional neural processes for data imputation
CN117591658B (en) Intelligent question-answering method, device, equipment and storage medium
CN111931498B (en) User online question processing method and system based on complexity analysis
US20220286416A1 (en) Method and apparatus for generating account intimacy
CN117648434A (en) BERT aspect-level emotion classification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant