CN113723115A - Open domain question-answer prediction method based on pre-training model and related equipment - Google Patents

Open domain question-answer prediction method based on pre-training model and related equipment Download PDF

Info

Publication number
CN113723115A
CN113723115A CN202111167748.7A CN202111167748A CN113723115A CN 113723115 A CN113723115 A CN 113723115A CN 202111167748 A CN202111167748 A CN 202111167748A CN 113723115 A CN113723115 A CN 113723115A
Authority
CN
China
Prior art keywords
cluster
fragment
target
segment
query statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111167748.7A
Other languages
Chinese (zh)
Other versions
CN113723115B (en
Inventor
成杰峰
彭奕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202111167748.7A priority Critical patent/CN113723115B/en
Publication of CN113723115A publication Critical patent/CN113723115A/en
Application granted granted Critical
Publication of CN113723115B publication Critical patent/CN113723115B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of artificial intelligence, and particularly provides an open domain question-answer prediction method based on a pre-training model and related equipment, wherein the method comprises the following steps: coding the query statement to obtain a query vector; matching the query vector with at least one segment cluster to determine a target segment cluster to which the query statement belongs; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the fragments in the target fragment cluster; repeatedly executing the operation of selecting at least one segment according to the posterior probability and obtaining an updated query statement according to the at least one segment until no segment directly connected with the currently selected at least one segment exists in the target segment cluster; and calculating the posterior probability of the latest query statement and the fragments in the target fragment cluster, and returning a question and answer result according to the posterior probability. The method and the device are beneficial to improving the prediction efficiency in the question answering of the open domain.

Description

Open domain question-answer prediction method based on pre-training model and related equipment
Technical Field
The application relates to the technical field of intelligent question answering, in particular to an open domain question answering prediction method based on a pre-training model and related equipment.
Background
With the development of the internet, the traffic of various industries is increased sharply, the scale of customers is gradually shifted from off-line to on-line, and the number of artificial customer services and the processing efficiency of each enterprise are far from keeping up with the increase of the customers on-line, so various intelligent question-answering systems are urgently needed to alleviate the phenomenon. Most of the existing intelligent question systems are based on closed domains, namely knowledge bases of questions and answers are limited in a certain specific neighborhood, such as banks, insurance, inquiry and the like, and researchers propose open-domain question and answer technology (open-domain QA) driven by customer requirements, which is not limited to questions and answers of a certain neighborhood but learns knowledge based on massive text documents (such as knowledge bases of Wikipedia and the like) of various industries, so that questions of any neighborhood can be answered. In the existing open domain question-answering system, the posterior probability of a query statement and massive fragments needs to be calculated one by one, and the fragments with high probability are extracted.
Disclosure of Invention
In order to solve the problems, the application provides an open domain question-answer prediction method based on a pre-training model and related equipment, and the prediction efficiency in the open domain question-answer is favorably improved.
In order to achieve the above object, a first aspect of the embodiments of the present application provides an open-domain question-answer prediction method based on a pre-training model, where the method includes:
coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;
matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;
selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster;
repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster;
and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
With reference to the first aspect, in a possible implementation manner, at least one fragment cluster is obtained by clustering fragment data of each field, and before encoding an input query statement by using a pre-training model to obtain a query vector of the query statement, the method further includes:
determining a radius and a neighborhood density threshold value adopted for clustering fragment data of each field in a clustering algorithm;
and constructing a clustering graph based on the radius and the neighborhood density threshold.
With reference to the first aspect, in a possible implementation manner, the determining a radius used for clustering fragment data of each field in a clustering algorithm includes:
coding fragment data of each field by adopting a pre-training model to obtain at least one semantic vector;
carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points;
calculating the average distance between the points of the first target number of points;
repeatedly performing operations of performing logarithmic sampling and calculating the average distance between the points on at least one semantic vector for K times to obtain the average distance between the K points, wherein K is an integer greater than 1;
the average of the average distances between K points was taken as the radius.
With reference to the first aspect, in a possible implementation manner, determining a neighborhood density threshold used for clustering fragment data of each field in a clustering algorithm includes:
carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points;
randomly selecting one point from the second target quantity points as a clustering center, and calculating the quantity of the same type points of the clustering center according to the radius and a predefined discriminant function;
repeatedly performing K times of logarithmic sampling on at least one semantic vector, randomly selecting a point as a clustering center and calculating the number of the similar points of the clustering center to obtain K number values;
and taking the average value of the K number of magnitude values as a neighborhood density threshold value.
With reference to the first aspect, in one possible implementation manner, constructing the clustering graph based on a radius and a neighborhood density threshold includes:
starting from any semantic vector in at least one semantic vector, acquiring the number of neighborhood points of any semantic vector according to the radius, if the number of the neighborhood points is less than a neighborhood density threshold, determining any semantic vector as a boundary point, and if the number of the neighborhood points is greater than or equal to the neighborhood density threshold, determining any semantic vector as a core point;
if any semantic vector is a core point, determining a point with the reachable density of any semantic vector and the density of any semantic vector as a fragment cluster, if any semantic vector is a boundary point, adding any semantic vector into the fragment cluster to which the core point with the reachable density of any semantic vector belongs until the core point in at least one semantic vector is clustered, and obtaining at least one fragment cluster;
and endowing an edge for the neighborhood point in each fragment cluster in at least one fragment cluster to obtain a clustering map.
With reference to the first aspect, in a possible implementation manner, matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster includes:
calculating, for each of the at least one segment cluster, an average of the core points in each segment cluster;
taking the average value of the core points in each fragment cluster as the clustering center of each fragment cluster;
calculating the target distance between the query vector and the clustering center of each fragment cluster;
and determining the segment cluster represented by the cluster center with the minimum target distance in the at least one segment cluster as the target segment cluster.
With reference to the first aspect, in a possible implementation manner, encoding an input query statement by using a pre-training model to obtain a query vector of the query statement includes:
preprocessing a query statement to obtain a word vector of the query statement;
calculating based on the word vectors to obtain a query matrix, a key matrix and a value matrix;
calculating to obtain attention weight based on the query matrix, the key matrix and the value matrix;
and multiplying the attention weight by the value matrix to obtain an attention vector, and encoding the attention vector to obtain a query vector.
A second aspect of the embodiments of the present application provides an open domain question-answer prediction apparatus based on a pre-training model, where the apparatus includes:
the coding unit is used for coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;
the matching unit is used for matching the query vector with at least one segment cluster in a pre-constructed clustering map so as to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;
the updating unit is used for selecting at least one segment from the target segment cluster, obtaining an updated query statement according to the at least one segment, and calculating the posterior probability of the updated query statement and the first segment in the target segment cluster;
the updating unit is further used for repeatedly executing the operation of selecting at least one fragment from the target fragment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one fragment until the target fragment cluster does not have the fragment directly connected with the at least one currently selected fragment;
and the prediction unit is used for calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
A third aspect of embodiments of the present application provides an electronic device, which includes an input device, an output device, and a processor, and is adapted to implement one or more instructions; and a computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the steps of:
coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;
matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;
selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster;
repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster;
and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
A fourth aspect of embodiments of the present application provides a computer storage medium having one or more instructions stored thereon, the one or more instructions adapted to be loaded by a processor and to perform the following steps:
coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;
matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;
selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster;
repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster;
and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
The above scheme of the present application includes at least the following beneficial effects:
in the embodiment of the application, the input query statement is coded by adopting a pre-training model to obtain a query vector of the query statement; matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster; repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster; and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability. The method includes the steps of firstly clustering fragment data to obtain a clustering graph, when query sentences are input, selecting a target fragment cluster from at least one fragment cluster, using the target fragment cluster as a database, screening at least one fragment from a layer in the target fragment cluster, and returning a fragment with the maximum target posterior probability and a fragment related to the fragment with the maximum target posterior probability in each layer as an open domain question-answer result.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic diagram of an application environment provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of an open-domain question-answer prediction method based on a pre-training model according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a cluster map generation provided in an embodiment of the present application;
fig. 4 is a schematic flowchart of another open-domain question-answer prediction method based on a pre-training model according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an open-domain question-answer prediction apparatus based on a pre-training model according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "comprising" and "having," and any variations thereof, as appearing in the specification, claims and drawings of this application, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. Furthermore, the terms "first," "second," and "third," etc. are used to distinguish between different objects and are not used to describe a particular order.
An embodiment of the present application provides an open domain question-answer prediction method based on a pre-training model, which may be implemented based on an application environment shown in fig. 1, please refer to fig. 1, where the application environment includes an electronic device and a user device connected to the electronic device through a network. Wherein the user equipment is provided with an input interface for receiving a query sentence input by a user, such as a consultation question sentence of the user for commodity details, and a communication interface for transmitting the query sentence to the electronic equipment. The electronic equipment receives the query statement through a communication interface of the electronic equipment, and transmits the query statement to the processor, so that the processor executes the open domain question-answer prediction method based on the pre-training model. Because the electronic equipment reduces the query range to the target segment cluster, and does not need to query in each segment cluster, the query calculation amount is greatly reduced, and the prediction efficiency in open domain question answering is favorably improved.
For example, the electronic device may be an independent server, or may be a cloud server that provides basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, web service, cloud communication, middleware service, domain name service, security service, Content Delivery Network (CDN), big data and artificial intelligence platform, and the like. Any of the at least one terminal may be a smartphone, a computer, a wearable device, and a vehicle-mounted device, among others.
Based on the application environment shown in fig. 1, the following describes in detail an open domain question-answering prediction method based on a pre-training model provided in an embodiment of the present application with reference to other drawings.
Referring to fig. 2, fig. 2 is a schematic flow chart of an open domain question-answer prediction method based on a pre-training model according to an embodiment of the present application, where the method is applied to an electronic device, as shown in fig. 2, and includes steps 201 and 205:
201: and coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement.
In an embodiment of the present application, the pre-training model may adopt a BERT (Bidirectional Encoder representation based on transforms) model, where the BERT model is trained and fine-tuned in advance by using data of each neighborhood, so that the model can learn deep information of the data of each neighborhood. Illustratively, encoding the input query statement by using a pre-training model to obtain a query vector of the query statement includes:
preprocessing a query statement to obtain a word vector of the query statement;
calculating based on the word vectors to obtain a query matrix, a key matrix and a value matrix;
calculating to obtain attention weight based on the query matrix, the key matrix and the value matrix;
and multiplying the attention weight by the value matrix to obtain an attention vector, and encoding the attention vector to obtain a query vector.
It should be understood thatThe BERT model is encoded by a Transformer encoder, a bottom encoder preprocesses an input Query statement (Query) to obtain a corresponding word vector, for example, the preprocessing can be word embedding or one-hot encoding, the self-attention layer of the Transformer encoder constructs a corresponding Query vector q, a key vector k and a value vector v based on the word vector, and the Query vector q, the key vector k and the value vector v are respectively combined with a pre-trained Query weight matrix WqWeight matrix WkSum weight matrix WvMultiplying to obtain a query matrix Q, a key matrix K and a value matrix V, then calculating an attention weight alpha based on the query matrix, the key matrix and the value matrix, finally multiplying the attention weight alpha and the value matrix V to obtain an output attention vector from an attention layer, and coding the attention vector through a feedforward neural network to obtain a query vector.
202: and matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster.
In a specific embodiment of the present application, at least one fragment cluster is obtained by clustering fragment data in each field, and before encoding an input query statement using a pre-training model to obtain a query vector of the query statement, the method further includes:
determining a radius and a neighborhood density threshold value adopted for clustering fragment data of each field in a clustering algorithm;
and constructing a clustering graph based on the radius and the neighborhood density threshold.
For example, determining the radius used for clustering the fragment data of each field in the clustering algorithm includes:
coding fragment data of each field by adopting a pre-training model to obtain at least one semantic vector;
carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points;
calculating the average distance between the points of the first target number of points;
repeatedly performing operations of performing logarithmic sampling and calculating the average distance between the points on at least one semantic vector for K times to obtain the average distance between the K points, wherein K is an integer greater than 1;
the average of the average distances between K points was taken as the radius.
Specifically, assuming that the number of at least one semantic vector is N, the number of targets obtained by each logarithmic sampling is [ ln N]I.e. the first target number is [ ln N ]]To the [ ln N ]]Any two of the points PiAnd P isjUsing Euclidean distance as its distance metric, the [ ln N ] can be calculated]The average distance between points is given by the following formula:
Figure BDA0003289970250000081
wherein eps represents [ ln N ]]Mean distance between points, dist (P)i,Pj) Representing point PiAnd a point PjThe euclidean distance of (c).
For at least one semantic vector, in order to avoid unbalanced sampling, repeatedly performing operations of performing logarithmic sampling and calculating average distance between points on at least one semantic vector for K times to obtain the average distance between K points, and calculating the average value of the average distance between the K points as the radius Eps of a neighborhood, wherein the formula is as follows:
Figure BDA0003289970250000082
wherein eps1,eps2,…,epskRespectively, the 1 st point average distance, the 2 nd point average distance, …, the Kth point average distance, and Eps represents the final neighborhood radius. In this embodiment, so select samples [ ln N]The point is that the number of samples in the open field scene is large, usually tens of millions or more, and if all distance calculations are performed, the calculation amount is very large (N)2Magnitude), therefore, taking logarithmic sampling can significantly reduce the amount of computation, and at the same time, performing logarithmic sampling K times can solve the problem of sampling imbalance.
For example, determining a neighborhood density threshold for clustering fragment data of each domain in a clustering algorithm includes:
carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points;
randomly selecting one point from the second target quantity points as a clustering center, and calculating the quantity of the same type points of the clustering center according to the radius and a predefined discriminant function;
repeatedly performing K times of logarithmic sampling on at least one semantic vector, randomly selecting a point as a clustering center and calculating the number of the similar points of the clustering center to obtain K number values;
and taking the average value of the K number of magnitude values as a neighborhood density threshold value.
Specifically, in order to reduce the amount of calculation, when a neighborhood density threshold is determined, logarithmic sampling is also performed on at least one semantic vector to obtain [ ln N ] points, that is, the number of second targets is [ ln N ], then a point X is randomly selected from the [ ln N ] points as a clustering center, and then the number of points belonging to the same category as the clustering center is calculated based on a previously determined radius parameter Eps and a predefined discriminant function, where the discriminant function is defined as:
Figure BDA0003289970250000091
where u represents the ratio of the two-point spacing to Eps, and the discriminant function D (u) represents: for a point, if the distance between the point nearby and the point nearby is less than Eps, the point is the same kind of point. The calculation formula of the number of the homologous points is as follows:
Figure BDA0003289970250000092
wherein, Count represents the number of homologous points of the clustering center X in a single calculation, dist (P)iX) represents a point PiEuclidean distance from the cluster center X. Similar to the radius parameter Eps, to avoid samplingRepeating the operations of logarithmic sampling for K times, selecting a clustering center and calculating the number of similar points of the clustering center to obtain K number values, and calculating the average value of the K number values as a neighborhood density threshold Minpts, wherein the formula is as follows:
Figure BDA0003289970250000093
wherein, Count1,Count2,…,CountkRespectively, 1 st, 2 nd, … th and Kth numeric values. In the embodiment, similar to the radius parameter Eps, the neighborhood density threshold Minpts is determined in a self-adaptive manner, in clustering, the radius Eps and the neighborhood density threshold Minpts need to be determined in advance, and the two values often bring distinct clustering results according to different selections, so that the accuracy of a final returned result is influenced.
Illustratively, constructing the clustering graph based on the radius and the neighborhood density threshold comprises:
starting from any semantic vector in at least one semantic vector, acquiring the number of neighborhood points of any semantic vector according to the radius, if the number of the neighborhood points is less than a neighborhood density threshold, determining any semantic vector as a boundary point, and if the number of the neighborhood points is greater than or equal to the neighborhood density threshold, determining any semantic vector as a core point;
if any semantic vector is a core point, determining a point with the reachable density of any semantic vector and the density of any semantic vector as a fragment cluster, if any semantic vector is a boundary point, adding any semantic vector into the fragment cluster to which the core point with the reachable density of any semantic vector belongs until the core point in at least one semantic vector is clustered, and obtaining at least one fragment cluster;
and endowing an edge for the neighborhood point in each fragment cluster in at least one fragment cluster to obtain a clustering map.
Specifically, each fragment data corresponds to one semantic vector, the semantic vectors are all represented as one point in a high-dimensional space, any one semantic vector is represented as a point p, the number of neighborhood points of the point p is determined according to a preset radius Eps, if the number of the neighborhood points of the point p is smaller than a neighborhood density threshold value Minpts, the point p is a boundary point, if the number of the neighborhood points of the point p is larger than or equal to the neighborhood density threshold value Minpts, the point p is a core point, as shown in fig. 3, if the neighborhood density threshold value Minpts is 3, 3 points exist in a neighborhood of the p point, the p point is the core point, and if only two points exist in a neighborhood of the q point, the q point is the boundary point. If the point p is a core point, a segment cluster can be determined, and all points with the reachable density of the point p belong to the segment cluster, if the point p is a boundary point, the point p can be divided into the segment clusters to which the core point with the reachable density belongs, all the segment clusters to which the core point belongs are determined according to the method, at least one segment cluster is obtained, for each segment cluster in the at least one segment cluster, an edge is given to a neighborhood point in each segment cluster, for example, in the neighborhood of the point p in the point q in fig. 3, an edge is given to the point p and the point q, in the neighborhood of the point s, an edge is given to the point p and the point s, a graph corresponding to each segment cluster is obtained, the clustering graphs form the clustering graph, and the clustering graph is stored for subsequent matching.
Illustratively, matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster, includes:
calculating, for each of the at least one segment cluster, an average of the core points in each segment cluster;
taking the average value of the core points in each fragment cluster as the clustering center of each fragment cluster;
calculating the target distance between the query vector and the clustering center of each fragment cluster;
and determining the segment cluster represented by the cluster center with the minimum target distance in the at least one segment cluster as the target segment cluster.
The minimum target distance between the query vector and the cluster center indicates that the cluster center is closest to the query vector, and the query statement belongs to the category represented by the cluster center.
203: and selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster.
In the embodiment of the application, a target fragment cluster is used as a database for Query, a posterior probability of each fragment in a Query vector and the target fragment cluster is calculated, the fragments in the target fragment cluster are sorted according to the posterior probability, at least one fragment with the posterior probability being greater than or equal to a preset value is selected, for example, at least one fragment is P1, P2 and P3 respectively, P1, P2 and P3 are combined with a Query statement respectively to obtain an updated Query statement, for example, Query P1 is formed by P1, the updated Query statement Query P1 is used as a new input, and the posterior probability of the updated Query statement Query P1 and a first fragment in the target fragment cluster is calculated, wherein the first fragment is a fragment in the target fragment cluster except P1.
204: and repeating the operation of selecting at least one fragment from the target fragment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the at least one currently selected fragment exists in the target fragment cluster.
In the embodiment of the application, according to the posterior probability obtained by the last calculation, the fragments in the target fragment cluster except for P1 are sorted, at least one fragment with the posterior probability being greater than or equal to the preset value is selected, for example, the at least one fragment is P11, P12, P13, P1, P2, P3 and the last input Query P1 form an updated Query statement, for example, the at least one fragment and P12 form a currently updated Query statement P1P 12, and the above operations are repeated until at least one currently selected fragment in each path of P1, P2, and P3 does not have a directly connected fragment in the target fragment cluster, that is, a fragment having a correlation with at least one currently selected fragment does not exist in the target fragment cluster through analysis of a clustering map.
205: and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
In the embodiment of the application, assuming that after a current updated Query statement Query P1P 12 is formed, a segment directly connected to P12 does not exist in a target segment cluster, the update input is stopped, a target posterior probability of the current updated Query statement Query P1P 12 and a second segment in the target segment cluster is calculated, where the second segment is a segment other than P12 in the target segment cluster, the second segment is ranked according to the target posterior probability, a segment with the maximum target posterior probability, such as P115, is selected, P115, P12, and P1 are used as open-domain question-answer results of the Query statement, and then the open-domain question-answer result is returned to a user. Of course, the above is taken as an example, in an actual scenario, there are also updated query statements composed of P2 and P3, and the maximum target posterior probability refers to the maximum posterior probability in all current updated query statements. According to the requirement of the correlation between the segments, the number of at least one segment selected each time can be the same or different, for example, the input is updated later, the value of the calculated posterior probability may not be high overall, and therefore, the number of at least one segment can be in a descending trend to reduce the calculation amount.
It can be seen that, in the embodiment of the present application, an input query statement is encoded by using a pre-training model, so as to obtain a query vector of the query statement; matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster; repeatedly executing the operation of selecting at least one fragment from the target fragment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the currently selected at least one fragment exists in the target fragment cluster; and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability. The method includes the steps of firstly clustering fragment data to obtain a clustering graph, when query sentences are input, selecting a target fragment cluster from at least one fragment cluster, using the target fragment cluster as a database, screening at least one fragment from a layer in the target fragment cluster, and returning a fragment with the maximum target posterior probability and a fragment related to the fragment with the maximum target posterior probability in each layer as an open domain question-answer result.
Referring to fig. 4, fig. 4 is a schematic flow chart of another open-domain question-answer prediction method based on a pre-training model according to an embodiment of the present application, as shown in fig. 4, including steps 401 and 410:
401: coding fragment data of each field by adopting a pre-training model to obtain at least one semantic vector;
402: carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points, and calculating the average distance between the points of the first target number of points;
403: repeatedly executing operations of carrying out logarithmic sampling and calculating the average distance between the points on at least one semantic vector for K times to obtain the average distance between the K points, and taking the average value of the average distance between the K points as the radius of the cluster;
404: carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points, randomly selecting one point from the second target number of points as a clustering center, and calculating the number of similar points of the clustering center according to the radius and a predefined discriminant function;
405: repeatedly performing operations of performing logarithmic sampling on at least one semantic vector for K times, randomly selecting a point as a clustering center and calculating the number of similar points of the clustering center to obtain K number values, and taking the average value of the K number values as a neighborhood density threshold value of clustering;
406: coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;
407: matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster; the at least one fragment cluster is obtained by clustering based on a radius and a neighborhood density threshold;
408: selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster;
409: repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster;
410: and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
The specific implementation of steps 401 and 410 has been described in the embodiment shown in fig. 2, and can achieve the same or similar beneficial effects, and is not repeated here for avoiding repetition.
Please refer to fig. 5 based on the description of the embodiment of the open-domain question-answer prediction method based on the pre-training model, where fig. 5 is a schematic structural diagram of an open-domain question-answer prediction apparatus based on the pre-training model according to the embodiment of the present application, and as shown in fig. 5, the apparatus includes:
the encoding unit 501 is configured to encode the input query statement by using a pre-training model to obtain a query vector of the query statement;
a matching unit 502, configured to match the query vector with at least one segment cluster in a pre-constructed clustering map, so as to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;
an updating unit 503, configured to select at least one segment from the target segment cluster, obtain an updated query statement according to the at least one segment, and calculate a posterior probability between the updated query statement and a first segment in the target segment cluster;
the updating unit 503 is further configured to repeatedly perform operations of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining a currently updated query statement according to the at least one segment until no segment directly connected to the currently selected at least one segment exists in the target segment cluster;
the predicting unit 504 is configured to calculate a target posterior probability of the current updated query statement and the second segment in the target segment cluster, and return an open domain question-answer result of the query statement according to the target posterior probability.
It can be seen that, in the open-domain question-answer prediction apparatus based on the pre-training model shown in fig. 5, the pre-training model is used to encode the input query statement, so as to obtain the query vector of the query statement; matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster; repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster; and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability. The method includes the steps of firstly clustering fragment data to obtain a clustering graph, when query sentences are input, selecting a target fragment cluster from at least one fragment cluster, using the target fragment cluster as a database, screening at least one fragment from a layer in the target fragment cluster, and returning a fragment with the maximum target posterior probability and a fragment related to the fragment with the maximum target posterior probability in each layer as an open domain question-answer result.
In a possible embodiment, at least one fragment cluster is obtained by clustering fragment data of each domain, and the encoding unit 501 is further configured to:
determining a radius and a neighborhood density threshold value adopted for clustering fragment data of each field in a clustering algorithm;
and constructing a clustering graph based on the radius and the neighborhood density threshold.
In a possible implementation manner, in determining the radius used for clustering the fragment data of each field in the clustering algorithm, the encoding unit 501 is specifically configured to:
coding fragment data of each field by adopting a pre-training model to obtain at least one semantic vector;
carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points;
calculating the average distance between the points of the first target number of points;
repeatedly performing operations of performing logarithmic sampling and calculating the average distance between the points on at least one semantic vector for K times to obtain the average distance between the K points, wherein K is an integer greater than 1;
the average of the average distances between K points was taken as the radius.
In one possible implementation, in determining a neighborhood density threshold used for clustering fragment data of each field in a clustering algorithm, the encoding unit 501 is specifically configured to:
carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points;
randomly selecting one point from the second target quantity points as a clustering center, and calculating the quantity of the same type points of the clustering center according to the radius and a predefined discriminant function;
repeatedly performing K times of logarithmic sampling on at least one semantic vector, randomly selecting a point as a clustering center and calculating the number of the similar points of the clustering center to obtain K number values;
and taking the average value of the K number of magnitude values as a neighborhood density threshold value.
In a possible implementation, in constructing the cluster map based on the radius and the neighborhood density threshold, the encoding unit 501 is specifically configured to:
starting from any semantic vector in at least one semantic vector, acquiring the number of neighborhood points of any semantic vector according to the radius, if the number of the neighborhood points is less than a neighborhood density threshold, determining any semantic vector as a boundary point, and if the number of the neighborhood points is greater than or equal to the neighborhood density threshold, determining any semantic vector as a core point;
if any semantic vector is a core point, determining a point with the reachable density of any semantic vector and the density of any semantic vector as a fragment cluster, if any semantic vector is a boundary point, adding any semantic vector into the fragment cluster to which the core point with the reachable density of any semantic vector belongs until the core point in at least one semantic vector is clustered, and obtaining at least one fragment cluster;
and endowing an edge for the neighborhood point in each fragment cluster in at least one fragment cluster to obtain a clustering map.
In a possible implementation manner, in matching the query vector with at least one segment cluster in the pre-constructed cluster map to determine a target segment cluster to which the query statement belongs from the at least one segment cluster, the matching unit 502 is specifically configured to:
calculating, for each of the at least one segment cluster, an average of the core points in each segment cluster;
taking the average value of the core points in each fragment cluster as the clustering center of each fragment cluster;
calculating the target distance between the query vector and the clustering center of each fragment cluster;
and determining the segment cluster represented by the cluster center with the minimum target distance in the at least one segment cluster as the target segment cluster.
In a possible implementation manner, in terms of encoding an input query statement by using a pre-training model to obtain a query vector of the query statement, the encoding unit 501 is specifically configured to:
preprocessing a query statement to obtain a word vector of the query statement;
calculating based on the word vectors to obtain a query matrix, a key matrix and a value matrix;
calculating to obtain attention weight based on the query matrix, the key matrix and the value matrix;
and multiplying the attention weight by the value matrix to obtain an attention vector, and encoding the attention vector to obtain a query vector.
According to an embodiment of the present application, the units of the open-domain question-answering prediction apparatus based on the pre-trained model shown in fig. 5 may be respectively or completely combined into one or several other units to form the open-domain question-answering prediction apparatus, or some unit(s) thereof may be further split into multiple functionally smaller units to form the open-domain question-answering prediction apparatus, which may implement the same operation without affecting implementation of technical effects of embodiments of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the open-domain question-answering prediction apparatus based on the pre-trained model may also include other units, and in practical applications, these functions may also be implemented by assistance of other units, and may be implemented by cooperation of multiple units.
According to another embodiment of the present application, the open-domain question-and-answer prediction apparatus device based on the pre-trained model as shown in fig. 5 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the corresponding method as shown in fig. 2 or fig. 4 on a general-purpose computing device, such as a computer, including a processing element and a storage element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and the like, and the open-domain question-and-answer prediction method based on the pre-trained model according to an embodiment of the present application may be implemented. The computer program may be recorded on a computer-readable recording medium, for example, and loaded and executed in the above-described computing apparatus via the computer-readable recording medium.
Based on the description of the method embodiment and the device embodiment, the embodiment of the application further provides an electronic device. Referring to fig. 6, the electronic device includes at least a processor 601, an input device 602, an output device 603, and a computer storage medium 604. The processor 601, input device 602, output device 603, and computer storage medium 604 within the electronic device may be connected by a bus or other means.
A computer storage medium 604 may be stored in the memory of the electronic device, the computer storage medium 604 being for storing a computer program comprising program instructions, the processor 601 being for executing the program instructions stored by the computer storage medium 604. The processor 601 (or CPU) is a computing core and a control core of the electronic device, and is adapted to implement one or more instructions, and in particular, is adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function.
In one embodiment, the processor 601 of the electronic device provided in the embodiment of the present application may be configured to perform a series of open-domain question-answering prediction processes based on a pre-trained model:
coding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;
matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;
selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster;
repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster;
and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
It can be seen that, in the electronic device shown in fig. 6, the query vectors of the query sentences are obtained by encoding the input query sentences with the pre-training model; matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster; selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and a first fragment in the target fragment cluster; repeatedly executing the operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one segment until no segment directly connected with the at least one currently selected segment exists in the target segment cluster; and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability. The method includes the steps of firstly clustering fragment data to obtain a clustering graph, when query sentences are input, selecting a target fragment cluster from at least one fragment cluster, using the target fragment cluster as a database, screening at least one fragment from a layer in the target fragment cluster, and returning a fragment with the maximum target posterior probability and a fragment related to the fragment with the maximum target posterior probability in each layer as an open domain question-answer result.
In another embodiment, at least one fragment cluster is obtained by clustering fragment data of each field, and before encoding an input query statement using a pre-training model to obtain a query vector of the query statement, the processor 601 is further configured to:
determining a radius and a neighborhood density threshold value adopted for clustering fragment data of each field in a clustering algorithm;
and constructing a clustering graph based on the radius and the neighborhood density threshold.
In another embodiment, the processor 601 executes a clustering algorithm to determine the radius to cluster the fragment data of each domain, including:
coding fragment data of each field by adopting a pre-training model to obtain at least one semantic vector;
carrying out logarithmic sampling on at least one semantic vector to obtain a first target number of points;
calculating the average distance between the points of the first target number of points;
repeatedly performing operations of performing logarithmic sampling and calculating the average distance between the points on at least one semantic vector for K times to obtain the average distance between the K points, wherein K is an integer greater than 1;
the average of the average distances between K points was taken as the radius.
In another embodiment, the processor 601 executes a neighborhood density threshold for determining a neighborhood density threshold for clustering fragment data of each domain in a clustering algorithm, including:
carrying out logarithmic sampling on at least one semantic vector to obtain a second target number of points;
randomly selecting one point from the second target quantity points as a clustering center, and calculating the quantity of the same type points of the clustering center according to the radius and a predefined discriminant function;
repeatedly performing K times of logarithmic sampling on at least one semantic vector, randomly selecting a point as a clustering center and calculating the number of the similar points of the clustering center to obtain K number values;
and taking the average value of the K number of magnitude values as a neighborhood density threshold value.
In yet another embodiment, processor 601 performs the constructing the cluster map based on the radius and the neighborhood density threshold, including:
starting from any semantic vector in at least one semantic vector, acquiring the number of neighborhood points of any semantic vector according to the radius, if the number of the neighborhood points is less than a neighborhood density threshold, determining any semantic vector as a boundary point, and if the number of the neighborhood points is greater than or equal to the neighborhood density threshold, determining any semantic vector as a core point;
if any semantic vector is a core point, determining a point with the reachable density of any semantic vector and the density of any semantic vector as a fragment cluster, if any semantic vector is a boundary point, adding any semantic vector into the fragment cluster to which the core point with the reachable density of any semantic vector belongs until the core point in at least one semantic vector is clustered, and obtaining at least one fragment cluster;
and endowing an edge for the neighborhood point in each fragment cluster in at least one fragment cluster to obtain a clustering map.
In another embodiment, the matching of the query vector and at least one segment cluster in the pre-constructed clustering graph by the processor 601 to determine a target segment cluster to which the query statement belongs from the at least one segment cluster includes:
calculating, for each of the at least one segment cluster, an average of the core points in each segment cluster;
taking the average value of the core points in each fragment cluster as the clustering center of each fragment cluster;
calculating the target distance between the query vector and the clustering center of each fragment cluster;
and determining the segment cluster represented by the cluster center with the minimum target distance in the at least one segment cluster as the target segment cluster.
In another embodiment, the processor 601 performs encoding on the input query statement by using the pre-training model to obtain a query vector of the query statement, including:
preprocessing a query statement to obtain a word vector of the query statement;
calculating based on the word vectors to obtain a query matrix, a key matrix and a value matrix;
calculating to obtain attention weight based on the query matrix, the key matrix and the value matrix;
and multiplying the attention weight by the value matrix to obtain an attention vector, and encoding the attention vector to obtain a query vector.
By way of example, electronic devices include, but are not limited to, a processor 601, an input device 602, an output device 603, and a computer storage medium 604. And the system also comprises a memory, a power supply, an application client module and the like. The input device 602 may be a keyboard, touch screen, radio frequency receiver, etc., and the output device 603 may be a speaker, display, radio frequency transmitter, etc. It will be appreciated by those skilled in the art that the schematic diagrams are merely examples of an electronic device and are not limiting of an electronic device and may include more or fewer components than those shown, or some components in combination, or different components.
It should be noted that, since the processor 601 of the electronic device executes the computer program to implement the steps in the open-domain question-answer prediction method based on the pre-trained model, the embodiments of the open-domain question-answer prediction method based on the pre-trained model are all applicable to the electronic device, and all can achieve the same or similar beneficial effects.
An embodiment of the present application further provides a computer storage medium (Memory), which is a Memory device in an electronic device and is used to store programs and data. It is understood that the computer storage medium herein may include a built-in storage medium in the terminal, and may also include an extended storage medium supported by the terminal. The computer storage medium provides a storage space that stores an operating system of the terminal. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 601. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; alternatively, it may be at least one computer storage medium located remotely from the processor 601. In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by processor 601 to perform the corresponding steps described above with respect to the open-domain question-and-answer prediction method based on a pre-trained model.
Illustratively, the computer program of the computer storage medium includes computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.
It should be noted that, since the computer program of the computer storage medium is executed by the processor to implement the steps in the open-domain question-answer prediction method based on the pre-trained model, all the embodiments of the open-domain question-answer prediction method based on the pre-trained model are applicable to the computer storage medium, and can achieve the same or similar beneficial effects.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. An open domain question-answer prediction method based on a pre-training model is characterized by comprising the following steps:
coding an input query statement by adopting a pre-training model to obtain a query vector of the query statement;
matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;
selecting at least one fragment from the target fragment cluster, obtaining an updated query statement according to the at least one fragment, and calculating the posterior probability of the updated query statement and the first fragment in the target fragment cluster;
repeatedly executing the operation of selecting at least one fragment from the target fragment cluster according to the last obtained posterior probability and obtaining the current updated query statement according to the at least one fragment until no fragment directly connected with the currently selected at least one fragment exists in the target fragment cluster;
and calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
2. The method according to claim 1, wherein the at least one fragment cluster is obtained by clustering fragment data of each field, and before encoding an input query statement using a pre-training model to obtain a query vector of the query statement, the method further comprises:
determining a radius and a neighborhood density threshold value adopted for clustering fragment data of each field in a clustering algorithm;
and constructing the clustering graph based on the radius and the neighborhood density threshold value.
3. The method of claim 2, wherein determining the radius used by the clustering algorithm to cluster the fragment data of each domain comprises:
coding the fragment data of each field by adopting the pre-training model to obtain at least one semantic vector;
carrying out logarithmic sampling on the at least one semantic vector to obtain a first target number of points;
calculating the average distance between the points of the first target number of points;
repeatedly executing operations of carrying out logarithmic sampling and calculating the average distance between the points on the at least one semantic vector for K times to obtain the average distance between the K points, wherein K is an integer larger than 1;
and taking the average value of the average distances among the K points as the radius.
4. The method of claim 3, wherein determining a neighborhood density threshold for clustering fragment data of each domain in a clustering algorithm comprises:
carrying out logarithmic sampling on the at least one semantic vector to obtain a second target number of points;
randomly selecting one point from the second target quantity points as a clustering center, and calculating the quantity of the same type points of the clustering center according to the radius and a predefined discriminant function;
repeatedly performing K times of logarithmic sampling on the at least one semantic vector, randomly selecting a point as a clustering center and calculating the number of the similar points of the clustering center to obtain K number values;
and taking the average value of the K number of magnitude values as the neighborhood density threshold value.
5. The method of claim 3, wherein constructing the clustering graph based on the radius and the neighborhood density threshold comprises:
starting from any semantic vector in the at least one semantic vector, acquiring the number of neighborhood points of the any semantic vector according to the radius, if the number of the neighborhood points is smaller than the neighborhood density threshold, determining the any semantic vector as a boundary point, and if the number of the neighborhood points is larger than or equal to the neighborhood density threshold, determining the any semantic vector as a core point;
if any semantic vector is a core point, determining a point with the reachable density of any semantic vector and the density of any semantic vector as a fragment cluster, and if any semantic vector is a boundary point, adding any semantic vector into the fragment cluster to which the core point with the reachable density of any semantic vector belongs until the core point in at least one semantic vector is clustered, so as to obtain at least one fragment cluster;
and assigning an edge to the neighborhood point in each of the at least one fragment cluster to obtain the clustering graph.
6. The method according to any one of claims 1 to 4, wherein the matching the query vector with at least one segment cluster in a pre-constructed clustering graph to determine a target segment cluster to which the query statement belongs from the at least one segment cluster comprises:
calculating, for each of the at least one segment cluster, an average of core points in the each segment cluster;
taking the average value of the core points in each segment cluster as the clustering center of each segment cluster;
calculating a target distance between the query vector and the clustering center of each fragment cluster;
and determining the segment cluster represented by the cluster center with the minimum target distance in the at least one segment cluster as the target segment cluster.
7. The method of claim 1, wherein the encoding the input query statement using the pre-training model to obtain the query vector of the query statement comprises:
preprocessing the query statement to obtain a word vector of the query statement;
calculating based on the word vectors to obtain a query matrix, a key matrix and a value matrix;
calculating attention weight based on the query matrix, the key matrix and the value matrix;
and multiplying the attention weight by the value matrix to obtain an attention vector, and encoding the attention vector to obtain the query vector.
8. An open-domain question-answer prediction device based on a pre-training model, the device comprising:
the encoding unit is used for encoding the input query statement by adopting a pre-training model to obtain a query vector of the query statement;
the matching unit is used for matching the query vector with at least one segment cluster in a pre-constructed clustering map so as to determine a target segment cluster to which the query statement belongs from the at least one segment cluster;
the updating unit is used for selecting at least one segment from the target segment cluster, obtaining an updated query statement according to the at least one segment, and calculating the posterior probability of the updated query statement and the first segment in the target segment cluster;
the updating unit is further configured to repeatedly perform an operation of selecting at least one segment from the target segment cluster according to the last obtained posterior probability and obtaining a currently updated query statement according to the at least one segment until no segment directly connected with the currently selected at least one segment exists in the target segment cluster;
and the prediction unit is used for calculating the target posterior probability of the current updated query statement and the second segment in the target segment cluster, and returning the open domain question-answer result of the query statement according to the target posterior probability.
9. An electronic device comprising an input device and an output device, further comprising:
a processor adapted to implement one or more instructions; and the number of the first and second groups,
a computer storage medium having one or more instructions stored thereon, the one or more instructions adapted to be loaded by the processor and to perform the method of any of claims 1-7.
10. A computer storage medium having stored thereon one or more instructions adapted to be loaded by a processor and to perform the method of any of claims 1-7.
CN202111167748.7A 2021-09-30 2021-09-30 Open domain question-answer prediction method based on pre-training model and related equipment Active CN113723115B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111167748.7A CN113723115B (en) 2021-09-30 2021-09-30 Open domain question-answer prediction method based on pre-training model and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111167748.7A CN113723115B (en) 2021-09-30 2021-09-30 Open domain question-answer prediction method based on pre-training model and related equipment

Publications (2)

Publication Number Publication Date
CN113723115A true CN113723115A (en) 2021-11-30
CN113723115B CN113723115B (en) 2024-02-09

Family

ID=78685636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111167748.7A Active CN113723115B (en) 2021-09-30 2021-09-30 Open domain question-answer prediction method based on pre-training model and related equipment

Country Status (1)

Country Link
CN (1) CN113723115B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115687031A (en) * 2022-11-15 2023-02-03 北京优特捷信息技术有限公司 Method, device, equipment and medium for generating alarm description text
WO2023108995A1 (en) * 2021-12-15 2023-06-22 平安科技(深圳)有限公司 Vector similarity calculation method and apparatus, device and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140358928A1 (en) * 2013-06-04 2014-12-04 International Business Machines Corporation Clustering Based Question Set Generation for Training and Testing of a Question and Answer System
US20150293970A1 (en) * 2014-04-10 2015-10-15 Beijing Baidu Netcom Science And Technology Co., Ltd. Information searching method and device
US20180293302A1 (en) * 2017-04-06 2018-10-11 International Business Machines Corporation Natural question generation from query data using natural language processing system
CN110750629A (en) * 2019-09-18 2020-02-04 平安科技(深圳)有限公司 Robot dialogue generation method and device, readable storage medium and robot
CN112487173A (en) * 2020-12-18 2021-03-12 北京百度网讯科技有限公司 Man-machine conversation method, device and storage medium
KR20210051523A (en) * 2019-10-30 2021-05-10 주식회사 솔트룩스 Dialogue system by automatic domain classfication
CN113139042A (en) * 2021-04-25 2021-07-20 内蒙古工业大学 Emotion controllable reply generation method using fine-tuning and reordering strategy
WO2021169842A1 (en) * 2020-02-24 2021-09-02 京东方科技集团股份有限公司 Method and apparatus for updating data, electronic device, and computer readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140358928A1 (en) * 2013-06-04 2014-12-04 International Business Machines Corporation Clustering Based Question Set Generation for Training and Testing of a Question and Answer System
US20150293970A1 (en) * 2014-04-10 2015-10-15 Beijing Baidu Netcom Science And Technology Co., Ltd. Information searching method and device
US20180293302A1 (en) * 2017-04-06 2018-10-11 International Business Machines Corporation Natural question generation from query data using natural language processing system
CN110750629A (en) * 2019-09-18 2020-02-04 平安科技(深圳)有限公司 Robot dialogue generation method and device, readable storage medium and robot
KR20210051523A (en) * 2019-10-30 2021-05-10 주식회사 솔트룩스 Dialogue system by automatic domain classfication
WO2021169842A1 (en) * 2020-02-24 2021-09-02 京东方科技集团股份有限公司 Method and apparatus for updating data, electronic device, and computer readable storage medium
CN112487173A (en) * 2020-12-18 2021-03-12 北京百度网讯科技有限公司 Man-machine conversation method, device and storage medium
CN113139042A (en) * 2021-04-25 2021-07-20 内蒙古工业大学 Emotion controllable reply generation method using fine-tuning and reordering strategy

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TIANJIAO GUO: "Course Question Answering System Based on Artificial Intelligence", APPLICATION OF INTELLIGENT SYSTEMS IN MULTI-MODAL INFORMATION ANALYTICS. 2021 INTERNATIONAL CONFERENCE ON MULTI-MODAL INFORMATION ANALYTICS (MMIA 2021). ADVANCES IN INTELLIGENT SYSTEMS AND COMPUTING, vol. 2, pages 723 - 730 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023108995A1 (en) * 2021-12-15 2023-06-22 平安科技(深圳)有限公司 Vector similarity calculation method and apparatus, device and storage medium
CN115687031A (en) * 2022-11-15 2023-02-03 北京优特捷信息技术有限公司 Method, device, equipment and medium for generating alarm description text

Also Published As

Publication number Publication date
CN113723115B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN110766142A (en) Model generation method and device
US9852177B1 (en) System and method for generating automated response to an input query received from a user in a human-machine interaction environment
CN113723115A (en) Open domain question-answer prediction method based on pre-training model and related equipment
CN113268609A (en) Dialog content recommendation method, device, equipment and medium based on knowledge graph
CN114691828A (en) Data processing method, device, equipment and medium
CN111507108B (en) Alias generation method and device, electronic equipment and computer readable storage medium
CN114358023A (en) Intelligent question-answer recall method and device, computer equipment and storage medium
CN110489730A (en) Text handling method, device, terminal and storage medium
CN109474516B (en) Method and system for recommending instant messaging connection strategy based on convolutional neural network
CN116957128A (en) Service index prediction method, device, equipment and storage medium
Liu et al. Beyond top‐n accuracy indicator: a comprehensive evaluation indicator of cnn models in image classification
CN114880991A (en) Knowledge map question-answer entity linking method, device, equipment and medium
CN117795527A (en) Evaluation of output sequences using autoregressive language model neural networks
CN114268625B (en) Feature selection method, device, equipment and storage medium
CN111400413B (en) Method and system for determining category of knowledge points in knowledge base
CN111324722B (en) Method and system for training word weight model
CN110147881B (en) Language processing method, device, equipment and storage medium
CN113449079B (en) Text abstract generating method and device, electronic equipment and storage medium
CN116680390B (en) Vocabulary association recommendation method and system
US11755570B2 (en) Memory-based neural network for question answering
CN115146258B (en) Request processing method and device, storage medium and electronic equipment
CN111897884B (en) Data relationship information display method and terminal equipment
CN116992017A (en) Abnormal body detection method, device, equipment and storage medium
CN116414963A (en) Method, device and storage medium for inquiring reply content
CN117827887A (en) Recall method, system, electronic device and storage medium for complex domain dense channel

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant