WO2023134085A1 - Question answer prediction method and prediction apparatus, electronic device, and storage medium - Google Patents

Question answer prediction method and prediction apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2023134085A1
WO2023134085A1 PCT/CN2022/090750 CN2022090750W WO2023134085A1 WO 2023134085 A1 WO2023134085 A1 WO 2023134085A1 CN 2022090750 W CN2022090750 W CN 2022090750W WO 2023134085 A1 WO2023134085 A1 WO 2023134085A1
Authority
WO
WIPO (PCT)
Prior art keywords
candidate
question
original
text
vector
Prior art date
Application number
PCT/CN2022/090750
Other languages
French (fr)
Chinese (zh)
Inventor
舒畅
陈又新
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023134085A1 publication Critical patent/WO2023134085A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular to a method for predicting answers to questions, a predicting device, electronic equipment, and a storage medium.
  • Machine reading comprehension aims to allow machines to find answers to questions in a given text. This is a basic application scenario in natural language processing. Machine reading comprehension is widely used in question answering and dialogue systems.
  • the embodiment of the present application proposes a method for predicting answers to questions, including:
  • the original topic data includes original article data and original question data to be answered;
  • the associated data includes question mark vectors, candidate mark vectors corresponding to each of the candidate texts, and associated values; wherein, the The association value is used to characterize the association between the original question data and each of the candidate texts;
  • the embodiment of the present application proposes a device for predicting answers to questions, including:
  • the acquisition module is used to acquire the original topic data to be predicted; the original topic data includes original article data and original question data to be answered;
  • An encoding module configured to encode the original article data and the original question data according to a preset first pre-training model to obtain a question encoding vector and an article encoding vector;
  • An attention screening module configured to perform attention screening processing on the question encoding vector and the article encoding vector to obtain a plurality of candidate texts
  • An association module configured to associate the original question data with each of the candidate texts to obtain associated data; wherein the associated data includes question mark vectors, candidate mark vectors corresponding to each of the candidate texts, and association values ; Wherein, the association value is used to characterize the association between the original question data and each of the candidate texts;
  • An answer screening module configured to perform answer screening processing on the question mark vector, the plurality of candidate texts, and each of the candidate mark vectors to obtain a confidence degree corresponding to each of the candidate texts; wherein, the confidence degree Used to characterize the probability that the candidate text contains a candidate answer;
  • a processing module configured to determine a candidate position according to the correlation value, the confidence level and a preset prediction threshold; wherein the candidate position is the position of the candidate answer;
  • a matching module configured to match corresponding candidate texts according to the candidate positions to obtain the candidate answers.
  • the embodiment of the present application provides an electronic device, including:
  • the program is stored in the memory, and the processor executes the at least one program to implement a method for predicting an answer to a question; wherein the method for predicting an answer to a question includes:
  • the original topic data includes original article data and original question data to be answered;
  • the associated data includes question mark vectors, candidate mark vectors corresponding to each of the candidate texts, and associated values; wherein, the The association value is used to characterize the association between the original question data and each of the candidate texts;
  • the embodiment of the present application provides a storage medium, the storage medium is a computer-readable storage medium, and the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to make the computer Carry out a method for predicting the answer to the question; wherein, the method for predicting the answer to the question includes:
  • the original topic data includes original article data and original question data to be answered;
  • the associated data includes question mark vectors, candidate mark vectors corresponding to each of the candidate texts, and associated values; wherein, the The association value is used to characterize the association between the original question data and each of the candidate texts;
  • a method for predicting answers to questions, a predicting device, electronic equipment, and a storage medium proposed in this application can delete useless text information that has nothing to do with the answers through attention screening mechanisms, association processing, and answer screening processing, effectively Select the part of the article that is relevant to the question, thereby improving the accuracy of the predicted answer.
  • Fig. 1 is the flowchart of the prediction method of the question answer that the embodiment of the present application provides;
  • Fig. 2 is the flowchart of the specific method of step S300 in Fig. 1;
  • Fig. 3 is the flow chart of the concrete method of step S330 in Fig. 2;
  • Fig. 4 is the flow chart of the specific method of step S400 in Fig. 1;
  • Fig. 5 is the flowchart of the specific method of step S430 in Fig. 4;
  • FIG. 6 is a flowchart of a specific method of step S500 in FIG. 1;
  • FIG. 7 is a flowchart of a specific method of step S530 in FIG. 6;
  • FIG. 8 is a block diagram of a device for predicting answers to questions provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application.
  • Artificial Intelligence It is a new technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence; artificial intelligence is a branch of computer science. Intelligence attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a manner similar to human intelligence. Research in this field includes robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is also a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • Natural language processing uses computers to process, understand and use human languages (such as Chinese, English, etc.). NLP belongs to a branch of artificial intelligence and is an interdisciplinary subject between computer science and linguistics. Known as computational linguistics. Natural language processing includes syntax analysis, semantic analysis, text understanding, etc. Natural language processing is often used in technical fields such as machine translation, handwritten and printed character recognition, speech recognition and text-to-speech conversion, information retrieval, information extraction and filtering, text classification and clustering, public opinion analysis and opinion mining. It involves language processing Related data mining, machine learning, knowledge acquisition, knowledge engineering, artificial intelligence research and linguistics research related to language computing, etc.
  • Medical cloud refers to the use of "cloud computing" to create a medical and health service cloud based on new technologies such as cloud computing, mobile technology, multimedia, 4G communication, big data, and the Internet of Things, combined with medical technology.
  • the platform realizes the sharing of medical resources and the expansion of medical coverage.
  • medical cloud improves the efficiency of medical institutions and facilitates residents to seek medical treatment. For example, appointment registration, electronic medical records, and medical insurance in hospitals are all products of the combination of cloud computing and the medical field. Medical cloud also has the advantages of data security, information sharing, dynamic expansion, and overall layout.
  • BERT Bidirectional Encoder Representation from Transformers
  • the BERT model further increases the generalization ability of the word vector model, fully describes the character-level, word-level, sentence-level and even inter-sentence relationship features, and is built based on Transformer.
  • Token Embeddings is a word vector, and the first word is a CLS mark, which can be used for subsequent classification tasks;
  • Segment Embeddings is used to distinguish two kinds of sentences, because pre- Training is not only to do LM, but also to do classification tasks with two sentences as input;
  • Position Embeddings the position word vector here is not the trigonometric function in transform, but learned by BERT after training.
  • BERT directly trains a Position Embeddings to retain position information. Each position randomly initializes a vector, joins the model training, and finally obtains an embedding containing position information. In the final combination of Position Embeddings and word embeddings, BERT chooses to splice directly .
  • CLS layer (classification): The CLS layer is part of the BERT model and is used for downstream classification tasks. It is mainly used for single text classification tasks and sentence pair classification tasks.
  • the BERT model inserts a [CLS] symbol in front of the text, and uses the output vector corresponding to the symbol as the semantic representation of the entire text for text classification. It can be understood that this symbol without obvious semantic information will more "fairly" fuse the semantic information of each word/word in the text compared with other words/words already in the text.
  • Sentence pair classification task The actual application scenarios of the sentence pair classification task include: question answering (judging whether a question matches an answer), sentence matching (whether two sentences express the same meaning), etc.
  • sentence pair classification task in addition to adding the [CLS] symbol and using the corresponding output as the semantic representation of the text, the BERT model also uses a [SEP] symbol as a segmentation for the two input sentences, and divides the two sentences respectively. The words are appended with two different text vectors for differentiation.
  • Sigmod function is a common S-type function in biology, also known as the S-type growth curve. In information science, due to its single-increase and inverse function single-increase properties, it is often used as the activation function of neural networks.
  • the sigmod function is also called the Logistic function. It is used for the output of the hidden layer neural unit. The value range is (0,1). It can map a real number to the interval of (0,1) and can be used for binary classification. The effect is better when the feature difference is complex or the difference is not particularly large.
  • Softmax function is a normalized exponential function that can "compress" a K-dimensional vector z containing any real number into another K-dimensional real vector, so that the range of each element is between (0,1). , and the sum of all elements is 1, this function is often used in multi-classification problems.
  • Attention mechanism is the attention mechanism. In layman's terms, the Attention mechanism is to focus on important points and ignore other unimportant factors. Attention is divided into spatial attention and temporal attention. The former is used for image processing and the latter is used for natural language processing. The Attention mechanism in this application is a temporal attention mechanism in natural language processing. The principle of Attention is to calculate the matching degree between the current input sequence and the output vector. The higher the matching degree is, the higher the relative score of the focus point is. The matching degree weight calculated by Attention is limited to the current sequence pair.
  • AI artificial intelligence
  • the embodiments of the present application may acquire and process relevant data based on artificial intelligence technology.
  • artificial intelligence is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Machine Reading Comprehension is a task that tests how well a machine understands natural language by asking it to answer questions based on a given context. It has the potential to revolutionize the relationship between humans and machines. interaction between. MRC is widely used in question answering and dialogue systems.
  • the embodiment of the present application proposes a method for predicting answers to questions, a predicting device, electronic equipment, and a storage medium, which can effectively select parts related to questions in an article, and delete useless text information efficiently, thereby improving Accuracy of predicted answers.
  • the question answer prediction method, prediction device, electronic device, and storage medium provided in the embodiments of the present application are specifically described through the following embodiments. First, the question answer prediction method in the embodiments of the present application is described.
  • the method for predicting the answer to a question provided in the embodiment of the present application relates to the technical field of artificial intelligence.
  • the method for predicting the answer to the question provided by the embodiment of the present application can be applied to the terminal, can also be applied to the server, and can also be software running on the terminal or the server.
  • the terminal can be a smart phone, a tablet computer, a notebook computer, a desktop computer, or a smart watch;
  • the server end can be configured as an independent physical server, or as a server cluster composed of multiple physical servers or as a distributed
  • the system can also be configured to provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN) and large Cloud servers for basic cloud computing services such as data and artificial intelligence platforms; software can be applications that implement activity classification model training methods, but are not limited to the above forms.
  • the embodiments of the present application can be used in many general-purpose or special-purpose computer system environments or configurations. Examples: personal computers, server computers, handheld or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer computing devices, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, etc.
  • This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.
  • some embodiments of the present application provide a method for predicting answers to questions, including step S100 , step S200 , step S300 , step S400 , step S500 , step S600 and step S700 . These seven steps are described in detail below, and it should be understood that the method for predicting the answer to a question includes but is not limited to these seven steps.
  • Step S100 Obtain the original topic data to be predicted; wherein, the original topic data includes original article data and original question data to be answered.
  • the original topic data may be medical data or other text data. If the original topic data is medical data, the original topic data can be obtained through the medical cloud server, or can be obtained through other channels.
  • Step S200 Encoding the original article data and original question data according to the preset first pre-training model to obtain question encoding vectors and article encoding vectors.
  • the preset first pre-training model may be a BERT model or other neural network models, which is not specifically limited in this application.
  • the letter Q is used to represent the original question data
  • the letter P is used to represent the original article data.
  • the original question data Q and the original article data are input into the BERT model for training, and the hidden state of the last layer of the BERT model is used as
  • the coding results of the original question data Q and the original article data P, the question coding vector and the article coding vector are obtained, and the question coding vector and the article coding vector are recorded as H Q and HP respectively, so as to achieve the original question data and the original article Data encoding processing.
  • Step S300 Perform attention screening on the question encoding vector and the article encoding vector to obtain multiple candidate texts.
  • Step S400 Perform association processing on the original question data and each candidate text to obtain associated data; wherein the associated data includes a question mark vector, a candidate mark vector corresponding to each candidate text, and an associated value; wherein, the associated value is used to represent the original question The correlation between the data and each candidate text.
  • Step S500 Perform answer screening processing on the question mark vector, multiple candidate texts and each candidate mark vector to obtain the corresponding confidence of each candidate text; where the confidence is used to represent the probability that the candidate text contains the candidate answer.
  • Step S600 Determine the candidate position according to the correlation value, the confidence level and the preset prediction threshold; where the candidate position is the position of the candidate answer.
  • Step S700 Match the corresponding candidate texts according to the candidate positions to obtain candidate answers.
  • Score 1 represents the correlation value
  • Score 2 represents the confidence level.
  • the target score value Score is obtained according to the correlation value and the confidence degree, and the target score value is determined by formula (1), which is:
  • the candidate positions are determined according to the target score value Score and the preset prediction threshold. If the target score value Score is greater than or equal to the preset prediction threshold, the candidate text is considered to be the position of the candidate answer, otherwise, the candidate text is considered not to contain the candidate answer, and a null character is output.
  • the corresponding candidate text is matched according to the candidate position, and the candidate answer corresponding to the original question data can be obtained.
  • the method for predicting the answer to the question in the embodiment of the present application uses the preset first pre-training model to encode the original article data and original question data in the original question data to obtain the question encoding vector and the article encoding vector, and then encode the question
  • the vector and the article encoding vector are subjected to attention screening processing to screen multiple candidate texts, and then the original question data and each candidate text are associated with each other to obtain the association value used to characterize the association between the original question data and each candidate text , the question label vector corresponding to the original question data and the candidate label vector corresponding to the candidate text; then answer screening processing is performed on the question label vector, candidate text and candidate label vector, and the confidence degree used to represent the probability of each candidate text containing the candidate answer is obtained , and finally determine the candidate positions according to the correlation value, the confidence level and the preset prediction threshold, so as to determine the candidate answers.
  • the attention screening mechanism, association processing, and answer screening processing useless text information that has nothing to do with the answer can be deleted, and the part related to the question
  • the original article data includes a plurality of original texts
  • each original text includes a plurality of text words
  • the original question data includes a plurality of question words.
  • step S300 includes step S310, step S320, and step S330. These two steps will be described in detail below with reference to FIG. 2. It should be understood that step S300 includes but not limited to step S310 to step S330.
  • Step S310 Perform attention operation on the question encoding vector and the article encoding vector according to the preset first attention model to obtain an attention matrix; wherein, the attention matrix includes a plurality of attention values, and each attention value is used to represent How important each text word is to the question word.
  • step S310 of some embodiments the original question data and original article data are encoded through the above step S200, and after obtaining the question encoding vector H Q and the article encoding vector H P , the question is encoded by the preset first attention model Vector H Q and article encoding vector H P piece attention operation.
  • the structure of the first attention model can adopt the method of match attention, and the match attention is shown in the formula (2), and the formula (2) is as follows:
  • A represents the calculated attention matrix
  • SoftMax represents the softmax function
  • W is the weight matrix
  • b represents the bias
  • e is a unit vector.
  • the attention matrix of the question encoding vector H Q and the article encoding vector H P can be calculated by formula (2).
  • the structure of the first attention model can also use the calculation method of cross attention. Its update formula is similar to the self-attention of the transformer, and it also uses three matrices of Q, K, and V to calculate the attention results. Different Yes, here K and V are calculated with H Q , and Q is calculated with HP . However, the attention result of the final operation has the same format as the match attention.
  • the dimension of the attention matrix A is consistent with the dimension of HP (H Q ) T , and the attention matrix A records the attention situation of each token embedding of the original question data on the original article data. That is, A i,j can represent the importance of the j-th text word in the original article data in the i-th question word of the original question data, then It can characterize the importance of the jth text word in the original article data in the entire original question data.
  • Step S320 Obtain a preset first attention threshold.
  • Step S330 Screen the original text according to the first attention threshold and the attention matrix to obtain multiple candidate texts.
  • step S330 of some embodiments since A i,j can represent the importance of the jth text word in the original article data in the i'th question word of the original question data, then the first attention threshold can be used for The original text is screened to obtain multiple candidate texts, where PA represents the candidate texts.
  • step S330 includes but not limited to step S331 and step S332 , which will be described in detail below in conjunction with FIG. 3 .
  • Step S331 Calculate the attention value of the same text word on the original question data according to the attention matrix, and obtain the corresponding text attention value; wherein, the text attention value is used to represent the importance of the text word to the original question data.
  • Step S332 If the text attention value is greater than the first attention threshold, obtain the original text corresponding to the text word to obtain the corresponding candidate text.
  • the text attention value is less than or equal to the preset first attention threshold, it means that the text word is not important enough for the original question data, and it is judged as useless text information.
  • a i,j can represent the importance of the jth text word in the original article data in the i'th question word of the original question data, then It can characterize the importance of the jth text word in the original article data in the entire original question data.
  • the attention value of the same text word to the original question data is calculated to obtain the corresponding text attention value, that is, A *j . If A *j is greater than the preset first attention threshold, the text word corresponding to , and use the original text as one of the candidate texts in multiple candidate texts. Repeat this operation to get multiple candidate texts.
  • step S400 includes step S410 , step S420 and step S430 . These three steps are described in detail below, it should be understood that step S400 includes but not limited to step S410 to step S430.
  • Step S410 Input the original question data and each candidate text into a preset second pre-training model; wherein, the second pre-training model includes a first neural network and a second neural network.
  • Step S420 Classifying and labeling the original question data and each candidate text through the first neural network to obtain a question label vector and a candidate label vector for each candidate text.
  • Step S430 Perform mapping and classification processing on the question label vector and each candidate label vector through the second neural network to obtain corresponding correlation values.
  • the second pre-training model can adopt the BERT model again, but the parameters of the BERT model are different from the BERT model of the aforementioned first pre-training model.
  • the first neural network It is the CLS layer of BERT.
  • the question mark vector and the candidate mark vector are input to the second neural network for fine-tuning, mapping and classification processing, and the associated value Score 1 corresponding to the candidate text is obtained.
  • the score of the correlation value Score 1 is between 0 and 1. The higher the score, the greater the probability that the candidate answer is in the candidate text.
  • the correlation value Score 1 determines the relevance between the original question data and the candidate text.
  • the second neural network includes a fully connected layer and an activation classification layer.
  • Step S430 includes step S431 and step S432, and these two steps will be described in detail below, it should be understood that step S430 includes but not limited to step S431 and step S432.
  • Step S431 Perform fully-connected processing on the question mark vector and each candidate mark vector through the fully-connected layer to obtain corresponding fully-connected values.
  • Step S432 Perform activation classification processing on the fully-connected values through the activation classification layer to obtain corresponding associated values.
  • the second neural network includes a fully connected layer and an activation classification layer, and the activation classification layer is a sigmod function.
  • the result of the last layer of CLS of the BERT model is connected to the fully connected network for fine adjustment to obtain the fully connected value, and then the fully connected value is passed through a sigmod function layer to output the judgment score to obtain the associated value.
  • step S500 includes step S510 , step S520 and step S530 . It should be understood that step S500 includes but is not limited to these three steps, which will be described in detail below.
  • Step S510 Perform attention screening on the question label vector and each candidate label vector by using the preset second attention model to obtain a plurality of texts to be detected.
  • Step S520 Obtain a vector to be detected corresponding to the text to be detected according to the text to be detected and the candidate marker vectors.
  • Step S530 Perform screening and prediction processing on the text to be detected and the vector to be detected through the preset answer prediction model to obtain the corresponding confidence level of the text to be detected.
  • the structure of the second attention model may or may not be consistent with the structure of the foregoing first attention model. Regardless of whether the structure of the two is consistent, the calculation process of the attention result is similar.
  • the question mark vector and the candidate mark vector continue to be processed by BERT encoding, and then screened through a layer of attention structure model to obtain multiple texts to be detected. Then, according to the text to be detected, the matching process is performed from the candidate tag vector to obtain the vector to be detected corresponding to the text to be detected, and finally, the vector to be detected and the text to be detected are input into the answer prediction model to obtain the confidence corresponding to the text to be detected .
  • the answer prediction model includes a first fully connected multi-head network and a second fully connected multi-head network.
  • Step S530 includes but not limited to step S531, step S532 and step S533.
  • Step S531 Perform initial prediction processing on the text to be detected and the vector to be detected through the first fully connected multi-head network, to obtain the initial prediction position of the text to be detected and the initial mark position of the vector to be detected;
  • Step S532 Perform end prediction processing on the text to be detected and the vector to be detected through the second fully connected multi-head network to obtain the predicted end position of the text to be detected and the end mark position of the vector to be detected.
  • Step S533 Obtain the confidence corresponding to the text to be detected according to the start prediction position, the start mark position, the end prediction position and the end mark position.
  • the hidden results corresponding to the candidate text PA are input into two fully connected multi-head networks (the first fully connected multi-head network and the second fully connected multi-head network) with the same structure but different parameters.
  • the multi-head network is connected to predict the start position and end position of candidate answers respectively.
  • the long result of judging the start position is represented by si
  • the long result of judging the end position is represented by e i . Mark the maximum subscripts of the two fully connected multi-head networks as start and end respectively, then the candidate answer
  • the text to be detected and the vector to be detected are subjected to initial prediction processing through the first fully-connected multi-head network, and the initial prediction position of the text to be detected and the initial mark position of the vector to be detected are obtained, and treated by the second fully-connected multi-head network
  • the detected text and the vector to be detected are subjected to end prediction processing to obtain the predicted end position of the text to be detected and the end mark position of the vector to be detected.
  • Score 2 (S start -S 1 )+(e end -e 1 ) (3)
  • Score 2 The goal of Score 2 is similar to that of Score 1 , but the starting point is different. Confidence Score 2 is used to represent the confidence of the extracted answer.
  • the target score value Score is obtained according to Score 1 and Score 2. If the target score value Score is greater than the preset prediction threshold, the A candidate obtained above is used as the candidate answer. If the target score value Score is less than The preset prediction threshold means that there is no answer, and a null character is output.
  • some embodiments of the present application also propose a prediction device 800 for question answers, including an acquisition module 810, an encoding module 820, an attention screening module 830, an association module 840, and an answer screening module 850 , a processing module 860 and a matching module 870.
  • the obtaining module 810 is used to obtain the original topic data to be predicted; the original topic data includes original article data and original question data to be answered.
  • the encoding module 820 is configured to encode the original article data and the original question data according to the preset first pre-training model to obtain question encoding vectors and article encoding vectors.
  • the attention screening module 830 is used to perform attention screening processing on the question coding vector and the article coding vector to obtain a plurality of candidate texts;
  • the association module 840 is used to associate the original question data with each candidate text to obtain associated data; wherein the associated data includes a question mark vector, a candidate mark vector corresponding to each candidate text, and an associated value; wherein, the associated value is used for Characterize the correlation between the original question data and each candidate text.
  • the answer screening module 850 is used to perform answer screening processing on the question mark vector, a plurality of candidate texts and each candidate mark vector, to obtain a confidence degree corresponding to each candidate text; wherein, the confidence degree is used to represent whether the candidate text contains the candidate answer probability.
  • the processing module 860 is configured to determine the candidate position according to the correlation value, the confidence level and the preset prediction threshold; wherein the candidate position is the position of the candidate answer.
  • the matching module 870 is configured to match corresponding candidate texts according to candidate positions to obtain candidate answers.
  • the question answer prediction device 800 of the embodiment of the present application encodes the original article data and original question data in the original question data through the preset first pre-training model to obtain the question encoding vector and the article encoding vector, and then the question
  • the encoding vector and the article encoding vector are subjected to attention screening processing to screen multiple candidate texts, and then the original question data and each candidate text are associated with each other to obtain the association used to characterize the relevance of the original question data and each candidate text value, the question mark vector corresponding to the original question data, and the candidate mark vector corresponding to the candidate text; then answer screening process is performed on the question mark vector, candidate text and candidate mark vector, and the confidence used to characterize the probability of each candidate text containing the candidate answer is obtained degree, and finally determine the candidate position according to the correlation value, confidence degree and preset prediction threshold, so as to determine the candidate answer.
  • the attention screening mechanism, association processing, and answer screening processing useless text information that has nothing to do with the answer can be deleted, and the part related to the question in the article can
  • the device for predicting the answer to the question in the embodiment of the present application corresponds to the method for predicting the answer to the question mentioned above.
  • the specific prediction process please refer to the method for predicting the answer to the question mentioned above, which will not be repeated here.
  • the embodiment of the present application also provides an electronic device, including:
  • the program is stored in the memory, and the processor executes at least one program to implement a method for predicting the answer to the question in the present application, wherein the method for predicting the answer to the question includes: obtaining the original question data to be predicted; where the original question data includes the original Article data and original question data to be answered; encode the original article data and original question data according to the preset first pre-training model to obtain question encoding vectors and article encoding vectors; pay attention to the question encoding vectors and article encoding vectors
  • the original question data is associated with each candidate text to obtain associated data; wherein the associated data includes question mark vectors, candidate mark vectors corresponding to each candidate text, and associated values; among them,
  • the correlation value is used to represent the correlation between the original question data and each candidate text; the answer screening process is performed on the question mark vector, multiple candidate texts and each candidate mark vector to obtain the corresponding confidence of each candidate text; where , the confidence degree is used to characterize the probability that the candidate text contains the candidate answer; the candidate position is determined
  • Figure 9 illustrates the hardware structure of an electronic device in another embodiment, the electronic device includes:
  • the processor 910 may be implemented by a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute Relevant programs to realize the technical solutions provided by the embodiments of the present application;
  • CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • the memory 920 may be implemented in the form of a read-only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device, or a random access memory (Random Access Memory, RAM).
  • the memory 920 can store operating systems and other application programs.
  • the relevant program codes are stored in the memory 920 and called by the processor 910 to execute the implementation of the present application. A method of predicting the answer to the question of the example;
  • the input/output interface 930 is used to realize information input and output
  • the communication interface 940 is used to realize the communication interaction between the device and other devices, and the communication can be realized through a wired method (such as USB, network cable, etc.), or can be realized through a wireless method (such as a mobile network, WIFI, Bluetooth, etc.);
  • bus 950 to transfer information between various components of the device (eg, processor 910, memory 920, input/output interface 930, and communication interface 940);
  • the processor 910 , the memory 920 , the input/output interface 930 and the communication interface 940 are connected to each other within the device through the bus 950 .
  • the embodiment of the present application also provides a storage medium, the storage medium is a computer-readable storage medium, and the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to make a computer execute a method of answering a question.
  • the prediction method of the question answer includes: obtaining the original topic data to be predicted; wherein, the original topic data includes the original article data and the original question data to be answered; according to the preset first pre-training model, the original article data Perform encoding processing with the original question data to obtain the question encoding vector and the article encoding vector; perform attention screening processing on the question encoding vector and the article encoding vector to obtain multiple candidate texts; perform association processing on the original question data and each candidate text, Obtain associated data; wherein the associated data includes a question mark vector, a candidate mark vector corresponding to each candidate text, and an associated value; wherein, the associated value is used to characterize the relevance between the original question data and each candidate text; the question mark vector , a plurality of candidate texts and each candidate tag vector to perform answer screening processing to obtain the corresponding confidence level of each candidate text; wherein, the confidence level is used to represent the probability that the candidate text contains the candidate answer; according to the associated value, confidence level and preset
  • the prediction threshold of is used
  • the computer readable storage medium can be nonvolatile or volatile.
  • memory can be used to store non-transitory software programs and non-transitory computer-executable programs.
  • the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices.
  • the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • At least one (item) means one or more, and “multiple” means two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural.
  • the character “/” generally indicates that the contextual objects are an “or” relationship.
  • At least one of the following” or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • At least one item (piece) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c ", where a, b, c can be single or multiple.
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including multiple instructions to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store programs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

A question answer prediction method and prediction apparatus, an electronic device, and a storage medium. The method comprises: acquiring original article data to be predicted and original question data to be answered; encoding the original article data and the original question data according to a preset first pre-training model to obtain a question encoding vector and an article encoding vector; performing attention screening processing on the question encoding vector and the article encoding vector to obtain a plurality of candidate texts; performing association processing on the original question data and the candidate texts to obtain a question mark vector, a candidate mark vector and an association value; performing answer screening processing on the question mark vector, the candidate texts and the candidate mark vector to obtain corresponding confidence; determining a candidate position according to the association value, the confidence and a preset prediction threshold; and matching a corresponding candidate text according to the candidate position to obtain a candidate answer. The method can improve the accuracy of predicting a question answer.

Description

问题答案的预测方法、预测装置、电子设备、存储介质Question answer prediction method, prediction device, electronic device, storage medium
本申请要求于2022年01月11日提交中国专利局、申请号为202210025867.7,发明名称为“问题答案的预测方法、预测装置、电子设备、存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202210025867.7 submitted to the China Patent Office on January 11, 2022, and the title of the invention is "prediction method of question answer, prediction device, electronic equipment, storage medium", the entire content of which Incorporated in this application by reference.
技术领域technical field
本申请涉及人工智能技术领域,尤其涉及一种问题答案的预测方法、预测装置、电子设备、存储介质。The present application relates to the technical field of artificial intelligence, and in particular to a method for predicting answers to questions, a predicting device, electronic equipment, and a storage medium.
背景技术Background technique
机器阅读理解旨在让机器在一段给定的文本中找到问题的答案,这是自然语言处理方面的一个基础性的应用场景,机器阅读理解广泛应用于问答和对话***。Machine reading comprehension aims to allow machines to find answers to questions in a given text. This is a basic application scenario in natural language processing. Machine reading comprehension is widely used in question answering and dialogue systems.
技术问题technical problem
以下是发明人意识到的现有技术的技术问题:采用预训练模型来处理各种自然语言问题的方法都较为类似,通常的做法都是直接将问题与全部的候选段落输入模型中,然后在全部段落中筛选出正确的答案区间或者判断无答案。然而,对于目前常用的阅读理解数据集来讲,针对每一个问题,候选段落中必然存在大量的无用文本信息,如果直接对这些文本进行答案筛选的话,那么无用的文本信息必然会干扰到正确答案所处区间的精度,不利于答案的筛选。The following is the technical problem of the prior art realized by the inventor: the method of using the pre-training model to deal with various natural language problems is relatively similar, and the usual practice is to directly input the problem and all candidate paragraphs into the model, and then Screen out the correct answer range from all paragraphs or judge that there is no answer. However, for the currently commonly used reading comprehension datasets, for each question, there must be a large amount of useless text information in the candidate paragraphs. If these texts are directly screened for answers, the useless text information will inevitably interfere with the correct answer. The accuracy of the interval is not conducive to the screening of answers.
技术解决方案technical solution
第一方面,本申请实施例提出了问题答案的预测方法,包括:In the first aspect, the embodiment of the present application proposes a method for predicting answers to questions, including:
获取待预测的原始题目数据;其中,所述原始题目数据包括原始文章数据和待回答的原始问题数据;Obtaining the original topic data to be predicted; wherein, the original topic data includes original article data and original question data to be answered;
根据预设的第一预训练模型对所述原始文章数据和所述原始问题数据进行编码处理,得到问题编码向量和文章编码向量;performing encoding processing on the original article data and the original question data according to a preset first pre-training model to obtain a question encoding vector and an article encoding vector;
对所述问题编码向量和所述文章编码向量进行注意力筛选处理,得到多个候选文本;performing attention screening on the question code vector and the article code vector to obtain a plurality of candidate texts;
对所述原始问题数据和每一所述候选文本进行关联处理,得到关联数据;其中所述关联数据包括问题标记向量、对应每一所述候选文本的候选标记向量、关联值;其中,所述关联值用于表征所述原始问题数据和每一所述候选文本之间的关联性;performing associative processing on the original question data and each of the candidate texts to obtain associated data; wherein the associated data includes question mark vectors, candidate mark vectors corresponding to each of the candidate texts, and associated values; wherein, the The association value is used to characterize the association between the original question data and each of the candidate texts;
对所述问题标记向量、所述多个候选文本和每一所述候选标记向量进行答案筛选处理,得到每一所述候选文本对应的置信度;其中,所述置信度用于表征所述候选文本包含候选答案的概率;performing answer screening processing on the question mark vector, the plurality of candidate texts, and each of the candidate mark vectors to obtain a confidence degree corresponding to each of the candidate texts; wherein the confidence degree is used to characterize the candidate the probability that the text contains the candidate answer;
根据所述关联值、所述置信度和预设的预测阈值确定候选位置;其中所述候选位置为所述候选答案所处位置;Determine a candidate position according to the correlation value, the confidence level and a preset prediction threshold; wherein the candidate position is the position of the candidate answer;
根据所述候选位置匹配对应的候选文本,得到所述候选答案。Matching corresponding candidate texts according to the candidate positions to obtain the candidate answers.
第二方面,本申请实施例提出了一种问题答案的预测装置,包括:In the second aspect, the embodiment of the present application proposes a device for predicting answers to questions, including:
获取模块,用于获取待预测的原始题目数据;所述原始题目数据包括原始文章数据和待回答的原始问题数据;The acquisition module is used to acquire the original topic data to be predicted; the original topic data includes original article data and original question data to be answered;
编码模块,用于根据预设的第一预训练模型对所述原始文章数据和所述原始问题数据进行编码处理,得到问题编码向量和文章编码向量;An encoding module, configured to encode the original article data and the original question data according to a preset first pre-training model to obtain a question encoding vector and an article encoding vector;
注意力筛选模块,用于对所述问题编码向量和所述文章编码向量进行注意力筛选处理,得到多个候选文本;An attention screening module, configured to perform attention screening processing on the question encoding vector and the article encoding vector to obtain a plurality of candidate texts;
关联模块,用于对所述原始问题数据和每一所述候选文本进行关联处理,得到关联数据;其中所述关联数据包括问题标记向量、对应每一所述候选文本的候选标记向量、关联值;其中,所述关联值用于表征所述原始问题数据和每一所述候选文本之间的关联性;An association module, configured to associate the original question data with each of the candidate texts to obtain associated data; wherein the associated data includes question mark vectors, candidate mark vectors corresponding to each of the candidate texts, and association values ; Wherein, the association value is used to characterize the association between the original question data and each of the candidate texts;
答案筛选模块,用于对所述问题标记向量、所述多个候选文本和每一所述候选标记向量进行答案筛选处理,得到每一所述候选文本对应的置信度;其中,所述置信度用于表征所述候选文本包含候选答案的概率;An answer screening module, configured to perform answer screening processing on the question mark vector, the plurality of candidate texts, and each of the candidate mark vectors to obtain a confidence degree corresponding to each of the candidate texts; wherein, the confidence degree Used to characterize the probability that the candidate text contains a candidate answer;
处理模块,用于根据所述关联值、所述置信度和预设的预测阈值确定候选位置;其中所述候选位置为所述候选答案所处位置;A processing module, configured to determine a candidate position according to the correlation value, the confidence level and a preset prediction threshold; wherein the candidate position is the position of the candidate answer;
匹配模块,用于根据所述候选位置匹配对应的候选文本,得到所述候选答案。A matching module, configured to match corresponding candidate texts according to the candidate positions to obtain the candidate answers.
第三方面,本申请实施例提出了一种电子设备,包括:In a third aspect, the embodiment of the present application provides an electronic device, including:
至少一个存储器;at least one memory;
至少一个处理器;at least one processor;
至少一个程序;at least one program;
所述程序被存储在所述存储器中,处理器执行所述至少一个程序以实现一种问题答案的预测方法;其中,所述问题答案的预测方法包括:The program is stored in the memory, and the processor executes the at least one program to implement a method for predicting an answer to a question; wherein the method for predicting an answer to a question includes:
获取待预测的原始题目数据;其中,所述原始题目数据包括原始文章数据和待回答的原始问题数据;Obtaining the original topic data to be predicted; wherein, the original topic data includes original article data and original question data to be answered;
根据预设的第一预训练模型对所述原始文章数据和所述原始问题数据进行编码处理,得到问题编码向量和文章编码向量;performing encoding processing on the original article data and the original question data according to a preset first pre-training model to obtain a question encoding vector and an article encoding vector;
对所述问题编码向量和所述文章编码向量进行注意力筛选处理,得到多个候选文本;performing attention screening on the question code vector and the article code vector to obtain a plurality of candidate texts;
对所述原始问题数据和每一所述候选文本进行关联处理,得到关联数据;其中所述关联数据包括问题标记向量、对应每一所述候选文本的候选标记向量、关联值;其中,所述关联值用于表征所述原始问题数据和每一所述候选文本之间的关联性;performing associative processing on the original question data and each of the candidate texts to obtain associated data; wherein the associated data includes question mark vectors, candidate mark vectors corresponding to each of the candidate texts, and associated values; wherein, the The association value is used to characterize the association between the original question data and each of the candidate texts;
对所述问题标记向量、所述多个候选文本和每一所述候选标记向量进行答案筛选处理,得到每一所述候选文本对应的置信度;其中,所述置信度用于表征所述候选文本包含候选答案的概率;performing answer screening processing on the question mark vector, the plurality of candidate texts, and each of the candidate mark vectors to obtain a confidence degree corresponding to each of the candidate texts; wherein the confidence degree is used to characterize the candidate the probability that the text contains the candidate answer;
根据所述关联值、所述置信度和预设的预测阈值确定候选位置;其中所述候选位置为所述候选答案所处位置;Determine a candidate position according to the correlation value, the confidence level and a preset prediction threshold; wherein the candidate position is the position of the candidate answer;
根据所述候选位置匹配对应的候选文本,得到所述候选答案。Matching corresponding candidate texts according to the candidate positions to obtain the candidate answers.
第四方面,本申请实施例提出了一种存储介质,所述存储介质为计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使计算机执行一种问题答案的预测方法;其中,所述问题答案的预测方法包括:In the fourth aspect, the embodiment of the present application provides a storage medium, the storage medium is a computer-readable storage medium, and the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to make the computer Carry out a method for predicting the answer to the question; wherein, the method for predicting the answer to the question includes:
获取待预测的原始题目数据;其中,所述原始题目数据包括原始文章数据和待回答的原始问题数据;Obtaining the original topic data to be predicted; wherein, the original topic data includes original article data and original question data to be answered;
根据预设的第一预训练模型对所述原始文章数据和所述原始问题数据进行编码处理,得到问题编码向量和文章编码向量;performing encoding processing on the original article data and the original question data according to a preset first pre-training model to obtain a question encoding vector and an article encoding vector;
对所述问题编码向量和所述文章编码向量进行注意力筛选处理,得到多个候选文本;performing attention screening on the question code vector and the article code vector to obtain a plurality of candidate texts;
对所述原始问题数据和每一所述候选文本进行关联处理,得到关联数据;其中所述关联数据包括问题标记向量、对应每一所述候选文本的候选标记向量、关联值;其中,所述关联值用于表征所述原始问题数据和每一所述候选文本之间的关联性;performing associative processing on the original question data and each of the candidate texts to obtain associated data; wherein the associated data includes question mark vectors, candidate mark vectors corresponding to each of the candidate texts, and associated values; wherein, the The association value is used to characterize the association between the original question data and each of the candidate texts;
对所述问题标记向量、所述多个候选文本和每一所述候选标记向量进行答案筛选处理,得到每一所述候选文本对应的置信度;其中,所述置信度用于表征所述候选文本包含候选答案的概率;performing answer screening processing on the question mark vector, the plurality of candidate texts, and each of the candidate mark vectors to obtain a confidence degree corresponding to each of the candidate texts; wherein the confidence degree is used to characterize the candidate the probability that the text contains the candidate answer;
根据所述关联值、所述置信度和预设的预测阈值确定候选位置;其中所述候选位置为所述候选答案所处位置;Determine a candidate position according to the correlation value, the confidence level and a preset prediction threshold; wherein the candidate position is the position of the candidate answer;
根据所述候选位置匹配对应的候选文本,得到所述候选答案。Matching corresponding candidate texts according to the candidate positions to obtain the candidate answers.
有益效果Beneficial effect
本申请提出的一种问题答案的预测方法、预测装置、电子设备、存储介质,通过注意力筛选机制、关联处理、和答案筛选处理,能够将与答案毫无关系的无用文本信息删除,有效地选择出文章中和问题有关的部分,从而提高预测答***性。A method for predicting answers to questions, a predicting device, electronic equipment, and a storage medium proposed in this application can delete useless text information that has nothing to do with the answers through attention screening mechanisms, association processing, and answer screening processing, effectively Select the part of the article that is relevant to the question, thereby improving the accuracy of the predicted answer.
附图说明Description of drawings
图1是本申请实施例提供的问题答案的预测方法的流程图;Fig. 1 is the flowchart of the prediction method of the question answer that the embodiment of the present application provides;
图2是图1中步骤S300的具体方法的流程图;Fig. 2 is the flowchart of the specific method of step S300 in Fig. 1;
图3是图2中步骤S330的具体方法的流程图;Fig. 3 is the flow chart of the concrete method of step S330 in Fig. 2;
图4是图1中步骤S400的具体方法的流程图;Fig. 4 is the flow chart of the specific method of step S400 in Fig. 1;
图5是图4中步骤S430的具体方法的流程图;Fig. 5 is the flowchart of the specific method of step S430 in Fig. 4;
图6是图1中步骤S500的具体方法的流程图;FIG. 6 is a flowchart of a specific method of step S500 in FIG. 1;
图7是图6中步骤S530的具体方法的流程图;FIG. 7 is a flowchart of a specific method of step S530 in FIG. 6;
图8是本申请实施例提供的问题答案的预测装置的模块框图;FIG. 8 is a block diagram of a device for predicting answers to questions provided by an embodiment of the present application;
图9是本申请实施例提供的电子设备的硬件结构示意图。FIG. 9 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application.
本发明的实施方式Embodiments of the present invention
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本申请的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本申请的技术方案而没有特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知方法、装置、实现或者操作以避免模糊本申请的各方面。Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided in order to give a thorough understanding of the embodiments of the application. However, those skilled in the art will appreciate that the technical solutions of the present application may be practiced without one or more of the specific details, or other methods, components, devices, steps, etc. may be employed. In other instances, well-known methods, apparatus, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。The block diagrams shown in the drawings are merely functional entities and do not necessarily correspond to physically separate entities. That is, these functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices entity.
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。The flow charts shown in the drawings are only exemplary illustrations, and do not necessarily include all contents and operations/steps, nor must they be performed in the order described. For example, some operations/steps can be decomposed, and some operations/steps can be combined or partly combined, so the actual order of execution may be changed according to the actual situation.
需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that although the functional modules are divided in the schematic diagram of the device, and the logical sequence is shown in the flowchart, in some cases, it can be executed in a different order than the module division in the device or the flowchart in the flowchart. steps shown or described. The terms "first", "second" and the like in the specification and claims and the above drawings are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of the present application, and are not intended to limit the present application.
首先,对本申请中涉及的若干名词进行解析:First, analyze some nouns involved in this application:
人工智能(artificial intelligence,AI):是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用***的一门新的技术科学;人工智能是计算机科学的一个分支,人工智能企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器,该领域的研究包括机器人、语言识别、图像识别、自然语言处理和专家***等。人工智能可以对人的意识、思维的信息过程的模拟。人工智能还是利用数字计算机或者数字计算机控制 的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用***。Artificial Intelligence (AI): It is a new technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence; artificial intelligence is a branch of computer science. Intelligence attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a manner similar to human intelligence. Research in this field includes robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is also a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
自然语言处理(natural language processing,NLP):NLP用计算机来处理、理解以及运用人类语言(如中文、英文等),NLP属于人工智能的一个分支,是计算机科学与语言学的交叉学科,又常被称为计算语言学。自然语言处理包括语法分析、语义分析、篇章理解等。自然语言处理常用于机器翻译、手写体和印刷体字符识别、语音识别及文语转换、信息检索、信息抽取与过滤、文本分类与聚类、舆情分析和观点挖掘等技术领域,它涉及与语言处理相关的数据挖掘、机器学习、知识获取、知识工程、人工智能研究和与语言计算相关的语言学研究等。Natural language processing (NLP): NLP uses computers to process, understand and use human languages (such as Chinese, English, etc.). NLP belongs to a branch of artificial intelligence and is an interdisciplinary subject between computer science and linguistics. Known as computational linguistics. Natural language processing includes syntax analysis, semantic analysis, text understanding, etc. Natural language processing is often used in technical fields such as machine translation, handwritten and printed character recognition, speech recognition and text-to-speech conversion, information retrieval, information extraction and filtering, text classification and clustering, public opinion analysis and opinion mining. It involves language processing Related data mining, machine learning, knowledge acquisition, knowledge engineering, artificial intelligence research and linguistics research related to language computing, etc.
医疗云(Medical cloud):医疗云是指在云计算、移动技术、多媒体、4G通信、大数据、以及物联网等新技术基础上,结合医疗技术,使用“云计算”来创建医疗健康服务云平台,实现了医疗资源的共享和医疗范围的扩大。因为云计算技术的运用于结合,医疗云提高医疗机构的效率,方便居民就医。像现在医院的预约挂号、电子病历、医保等都是云计算与医疗领域结合的产物,医疗云还具有数据安全、信息共享、动态扩展、布局全局的优势。Medical cloud: Medical cloud refers to the use of "cloud computing" to create a medical and health service cloud based on new technologies such as cloud computing, mobile technology, multimedia, 4G communication, big data, and the Internet of Things, combined with medical technology. The platform realizes the sharing of medical resources and the expansion of medical coverage. Because of the combination of cloud computing technology, medical cloud improves the efficiency of medical institutions and facilitates residents to seek medical treatment. For example, appointment registration, electronic medical records, and medical insurance in hospitals are all products of the combination of cloud computing and the medical field. Medical cloud also has the advantages of data security, information sharing, dynamic expansion, and overall layout.
BERT(Bidirectional Encoder Representation from Transformers)模型:BERT模型进一步增加词向量模型泛化能力,充分描述字符级、词级、句子级甚至句间关系特征,基于Transformer构建而成。BERT中有三种embedding,即Token Embedding,Segment Embedding,Position Embedding;其中Token Embeddings是词向量,第一个单词是CLS标志,可以用于之后的分类任务;Segment Embeddings用来区别两种句子,因为预训练不光做LM还要做以两个句子为输入的分类任务;Position Embeddings,这里的位置词向量不是transfor中的三角函数,而是BERT经过训练学到的。但BERT直接训练一个Position Embeddings来保留位置信息,每个位置随机初始化一个向量,加入模型训练,最后就得到一个包含位置信息的embedding,最后这个Position Embeddings和word embedding的结合方式上,BERT选择直接拼接。BERT (Bidirectional Encoder Representation from Transformers) model: The BERT model further increases the generalization ability of the word vector model, fully describes the character-level, word-level, sentence-level and even inter-sentence relationship features, and is built based on Transformer. There are three kinds of embeddings in BERT, namely Token Embedding, Segment Embedding, and Position Embedding; among them, Token Embeddings is a word vector, and the first word is a CLS mark, which can be used for subsequent classification tasks; Segment Embeddings is used to distinguish two kinds of sentences, because pre- Training is not only to do LM, but also to do classification tasks with two sentences as input; Position Embeddings, the position word vector here is not the trigonometric function in transform, but learned by BERT after training. However, BERT directly trains a Position Embeddings to retain position information. Each position randomly initializes a vector, joins the model training, and finally obtains an embedding containing position information. In the final combination of Position Embeddings and word embeddings, BERT chooses to splice directly .
CLS层(classification):CLS层是BERT模型的一部分,用于下游的分类任务。主要用于单文本分类任务和语句对分类任务。CLS layer (classification): The CLS layer is part of the BERT model and is used for downstream classification tasks. It is mainly used for single text classification tasks and sentence pair classification tasks.
单文本分类任务:BERT模型在文本前***一个[CLS]符号,并将该符号对应的输出向量作为整篇文本的语义表示,用于文本分类。可以理解为:与文本中已有的其它字/词相比,这个无明显语义信息的符号会更“公平”地融合文本中各个字/词的语义信息。Single text classification task: The BERT model inserts a [CLS] symbol in front of the text, and uses the output vector corresponding to the symbol as the semantic representation of the entire text for text classification. It can be understood that this symbol without obvious semantic information will more "fairly" fuse the semantic information of each word/word in the text compared with other words/words already in the text.
语句对分类任务:语句对分类任务的实际应用场景包括:问答(判断一个问题与一个答案是否匹配)、语句匹配(两句话是否表达同一个意思)等。对于语句对分类任务,BERT模型除了添加[CLS]符号并将对应的输出作为文本的语义表示以外,还对输入的两句话用一个[SEP]符号作为分割,并分别对分后的两句话附加两个不同文本向量以用作区分。Sentence pair classification task: The actual application scenarios of the sentence pair classification task include: question answering (judging whether a question matches an answer), sentence matching (whether two sentences express the same meaning), etc. For the sentence pair classification task, in addition to adding the [CLS] symbol and using the corresponding output as the semantic representation of the text, the BERT model also uses a [SEP] symbol as a segmentation for the two input sentences, and divides the two sentences respectively. The words are appended with two different text vectors for differentiation.
sigmod函数:sigmod函数是一个在生物学中常见的S型函数,也称为S型生长曲线。在信息科学中,由于其单增以及反函数单增等性质,常被用作神经网络的激活函数。sigmod函数也叫作Logistic函数,用于隐层神经单元输出,取值范围为(0,1),它可以将一个实数映射到(0,1)的区间,可以用来做二分类。在特征相差比较复杂或者相差不是特别大的时候效果比较好。Sigmod function: The sigmod function is a common S-type function in biology, also known as the S-type growth curve. In information science, due to its single-increase and inverse function single-increase properties, it is often used as the activation function of neural networks. The sigmod function is also called the Logistic function. It is used for the output of the hidden layer neural unit. The value range is (0,1). It can map a real number to the interval of (0,1) and can be used for binary classification. The effect is better when the feature difference is complex or the difference is not particularly large.
Softmax函数:Softmax函数是归一化指数函数,能将一个含任意实数的K维向量z“压缩”到另一个K维实向量中,使得每一个元素的范围都在(0,1)之间,并且所有元素的和为1,该函数常用于多分类问题中。Softmax function: The Softmax function is a normalized exponential function that can "compress" a K-dimensional vector z containing any real number into another K-dimensional real vector, so that the range of each element is between (0,1). , and the sum of all elements is 1, this function is often used in multi-classification problems.
Attention机制:Attention机制即为注意力机制,Attention机制通俗的讲就是把注意力集中放在重要的点上,而忽略其他不重要的因素。Attention分为空间注意力和时间注意力,前者用于图像处理,后者用于自然语言处理。在本申请中的Attention机制为自然语言处理方面的时间注意力机制。Attention的原理就是计算当前输入序列与输出向量的匹配程度,匹配程度高也就是注意力集中点其相对的得分越高,其中Attention计算得到的匹配度权重,只限于当前序列对。Attention mechanism: Attention mechanism is the attention mechanism. In layman's terms, the Attention mechanism is to focus on important points and ignore other unimportant factors. Attention is divided into spatial attention and temporal attention. The former is used for image processing and the latter is used for natural language processing. The Attention mechanism in this application is a temporal attention mechanism in natural language processing. The principle of Attention is to calculate the matching degree between the current input sequence and the output vector. The higher the matching degree is, the higher the relative score of the focus point is. The matching degree weight calculated by Attention is limited to the current sequence pair.
本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用***。The embodiments of the present application may acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互***、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
机器阅读理解(Machine Reading Comprehension,MRC):机器阅读理解(MRC)是一项通过让机器回答基于给定上下文的问题来测试机器理解自然语言的程度的任务,它有可能彻底改变人类和机器之间的互动方式。MRC被广泛应用于问答和对话***。Machine Reading Comprehension (MRC): Machine Reading Comprehension (MRC) is a task that tests how well a machine understands natural language by asking it to answer questions based on a given context. It has the potential to revolutionize the relationship between humans and machines. interaction between. MRC is widely used in question answering and dialogue systems.
相关技术中,采用预训练模型来处理各种自然语言问题的方法都较为类似,通常的做法都是直接将问题与全部的候选段落输入模型中,然后在全部段落中筛选出正确的答案区间或者判断无答案。然而,对于目前常用的阅读理解数据集来讲,针对每一个问题,候选段落中必然存在大量的无用文本信息,如果直接对这些文本进行答案筛选的话,那么无用的文本信息必然会干扰到正确答案所处区间的精度,不利于答案的筛选。In related technologies, the methods of using pre-trained models to deal with various natural language questions are relatively similar. The usual method is to directly input the questions and all candidate paragraphs into the model, and then filter out the correct answer range from all paragraphs or Judgment has no answer. However, for the currently commonly used reading comprehension datasets, for each question, there must be a large amount of useless text information in the candidate paragraphs. If these texts are directly screened for answers, the useless text information will inevitably interfere with the correct answer. The accuracy of the interval is not conducive to the screening of answers.
基于此,本申请实施例提出一种问题答案的预测方法、预测装置、电子设备、存储介质,能够有效地选择出文章中和问题有关的部分,高效率的删减掉无用文本信息,从而提高预测答***性。Based on this, the embodiment of the present application proposes a method for predicting answers to questions, a predicting device, electronic equipment, and a storage medium, which can effectively select parts related to questions in an article, and delete useless text information efficiently, thereby improving Accuracy of predicted answers.
本申请实施例提供的问题答案的预测方法、预测装置、电子设备、存储介质,具体通过如下实施例进行说明,首先描述本申请实施例中的问题答案的预测方法。The question answer prediction method, prediction device, electronic device, and storage medium provided in the embodiments of the present application are specifically described through the following embodiments. First, the question answer prediction method in the embodiments of the present application is described.
本申请实施例提供的问题答案的预测方法,涉及人工智能技术领域。本申请实施例提供的问题答案的预测方法可应用于终端中,也可应用于服务器端中,还可以是运行于终端或服务器端中的软件。在一些实施例中,终端可以是智能手机、平板电脑、笔记本电脑、台式计算机或者智能手表等;服务器端可以配置成独立的物理服务器,也可以配置成多个物理服务器构成的服务器集群或者分布式***,还可以配置成提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)以及大数据和人工智能平台等基础云计算服务的云服务器;软件可以是实现活动性分类模型训练方法的应用等,但并不局限于以上形式。The method for predicting the answer to a question provided in the embodiment of the present application relates to the technical field of artificial intelligence. The method for predicting the answer to the question provided by the embodiment of the present application can be applied to the terminal, can also be applied to the server, and can also be software running on the terminal or the server. In some embodiments, the terminal can be a smart phone, a tablet computer, a notebook computer, a desktop computer, or a smart watch; the server end can be configured as an independent physical server, or as a server cluster composed of multiple physical servers or as a distributed The system can also be configured to provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN) and large Cloud servers for basic cloud computing services such as data and artificial intelligence platforms; software can be applications that implement activity classification model training methods, but are not limited to the above forms.
本申请实施例可用于众多通用或专用的计算机***环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器***、基于微处理器的***、置顶盒、可编程的消费计算机设备、网络PC、小型计算机、大型计算机、包括以上任何***或设备的分布式计算环境等等。本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The embodiments of the present application can be used in many general-purpose or special-purpose computer system environments or configurations. Examples: personal computers, server computers, handheld or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer computing devices, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, etc. This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.
请参照图1,结合图1对本申请实施例的问题答案的预测方法的具体处理过程进行详细介绍。如图1所示,第一方面,本申请的一些实施例提供了一种问题答案的预测方法,包括步骤S100、步骤S200、步骤S300、步骤S400、步骤S500、步骤S600和步骤S700。下面对这七个步骤进行详细描述,应理解,问题答案的预测方法包括但不限于这七个步骤。Please refer to FIG. 1 , and in conjunction with FIG. 1 , the specific processing process of the method for predicting the answer to the question in the embodiment of the present application will be introduced in detail. As shown in FIG. 1 , in the first aspect, some embodiments of the present application provide a method for predicting answers to questions, including step S100 , step S200 , step S300 , step S400 , step S500 , step S600 and step S700 . These seven steps are described in detail below, and it should be understood that the method for predicting the answer to a question includes but is not limited to these seven steps.
步骤S100:获取待预测的原始题目数据;其中,原始题目数据包括原始文章数据和待回答的原始问题数据。Step S100: Obtain the original topic data to be predicted; wherein, the original topic data includes original article data and original question data to be answered.
在步骤S100中,原始题目数据可以是医疗数据,也可以是其他的文本数据。如果原始题目数据是医疗数据,该原始题目数据可以通过医疗云服务器获得,也可以通过其他的渠道获取。In step S100, the original topic data may be medical data or other text data. If the original topic data is medical data, the original topic data can be obtained through the medical cloud server, or can be obtained through other channels.
步骤S200:根据预设的第一预训练模型对原始文章数据和原始问题数据进行编码处理, 得到问题编码向量和文章编码向量。Step S200: Encoding the original article data and original question data according to the preset first pre-training model to obtain question encoding vectors and article encoding vectors.
在一些实施例的步骤S200中,预设的第一预训练模型可以是BERT模型,也可以是其他的神经网络模型,对此,本申请不作具体限制。以BERT模型为例,用字母Q表示原始问题数据,字母P表示原始文章数据,将原始问题数据Q和原始文章数据输入到BERT模型中进行训练,将BERT模型最后一层运行出的隐藏状态作为原始问题数据Q和原始文章数据P的编码结果,得到问题编码向量和文章编码向量,并将问题编码向量和文章编码向量分别记为H Q和H P,从而实现了对原始问题数据和原始文章数据的编码处理。 In step S200 of some embodiments, the preset first pre-training model may be a BERT model or other neural network models, which is not specifically limited in this application. Taking the BERT model as an example, the letter Q is used to represent the original question data, and the letter P is used to represent the original article data. The original question data Q and the original article data are input into the BERT model for training, and the hidden state of the last layer of the BERT model is used as The coding results of the original question data Q and the original article data P, the question coding vector and the article coding vector are obtained, and the question coding vector and the article coding vector are recorded as H Q and HP respectively, so as to achieve the original question data and the original article Data encoding processing.
步骤S300:对问题编码向量和文章编码向量进行注意力筛选处理,得到多个候选文本。Step S300: Perform attention screening on the question encoding vector and the article encoding vector to obtain multiple candidate texts.
步骤S400:对原始问题数据和每一候选文本进行关联处理,得到关联数据;其中关联数据包括问题标记向量、对应每一候选文本的候选标记向量、关联值;其中,关联值用于表征原始问题数据和每一候选文本之间的关联性。Step S400: Perform association processing on the original question data and each candidate text to obtain associated data; wherein the associated data includes a question mark vector, a candidate mark vector corresponding to each candidate text, and an associated value; wherein, the associated value is used to represent the original question The correlation between the data and each candidate text.
步骤S500:对问题标记向量、多个候选文本和每一候选标记向量进行答案筛选处理,得到每一候选文本对应的置信度;其中,置信度用于表征候选文本包含候选答案的概率。Step S500: Perform answer screening processing on the question mark vector, multiple candidate texts and each candidate mark vector to obtain the corresponding confidence of each candidate text; where the confidence is used to represent the probability that the candidate text contains the candidate answer.
步骤S600:根据关联值、置信度和预设的预测阈值确定候选位置;其中候选位置为候选答案所处位置。Step S600: Determine the candidate position according to the correlation value, the confidence level and the preset prediction threshold; where the candidate position is the position of the candidate answer.
步骤S700:根据候选位置匹配对应的候选文本,得到候选答案。Step S700: Match the corresponding candidate texts according to the candidate positions to obtain candidate answers.
在一些实施例的步骤S600中,以Score 1表示关联值,以Score 2表示置信度。首先根据关联值和置信度得到目标分数值Score,目标分数值通过公式(1)确定,公式(1)为: In step S600 of some embodiments, Score 1 represents the correlation value, and Score 2 represents the confidence level. Firstly, the target score value Score is obtained according to the correlation value and the confidence degree, and the target score value is determined by formula (1), which is:
Score=Score 1+Score 2     (1) Score=Score 1 +Score 2 (1)
得到目标分数值Score后,根据目标分数值Score和预设的预测阈值确定候选位置。如果目标分数值Score大于或等于预设的预测阈值,则认为该候选文本即为候选答案所处的位置,否则,则认为该候选文本不包含候选答案,并输出空字符。After the target score value Score is obtained, the candidate positions are determined according to the target score value Score and the preset prediction threshold. If the target score value Score is greater than or equal to the preset prediction threshold, the candidate text is considered to be the position of the candidate answer, otherwise, the candidate text is considered not to contain the candidate answer, and a null character is output.
确定候选位置后,根据该候选位置匹配对应的候选文本,即可得到原始问题数据对应的候选答案。After the candidate position is determined, the corresponding candidate text is matched according to the candidate position, and the candidate answer corresponding to the original question data can be obtained.
本申请实施例的问题答案的预测方法,通过预设的第一预训练模型对原始题目数据中的原始文章数据和原始问题数据进行编码处理,得到问题编码向量和文章编码向量,然后对问题编码向量和文章编码向量进行注意力筛选处理,以筛选得到多个候选文本,然后对原始问题数据和每一个候选文本进行关联处理,得到用于表征原始问题数据和每一候选文本关联性的关联值、原始问题数据对应的问题标记向量和候选文本对应的候选标记向量;再对问题标记向量、候选文本和候选标记向量进行答案筛选处理,得到用于表征每一候选本文包含候选答案概率的置信度,最后根据关联值、置信度和预设的预测阈值确定候选位置,从而确定候选答案。通过注意力筛选机制、关联处理、和答案筛选处理,能够将与答案毫无关系的无用文本信息删除,有效地选择出文章中和问题有关的部分,从而提高预测答***性。The method for predicting the answer to the question in the embodiment of the present application uses the preset first pre-training model to encode the original article data and original question data in the original question data to obtain the question encoding vector and the article encoding vector, and then encode the question The vector and the article encoding vector are subjected to attention screening processing to screen multiple candidate texts, and then the original question data and each candidate text are associated with each other to obtain the association value used to characterize the association between the original question data and each candidate text , the question label vector corresponding to the original question data and the candidate label vector corresponding to the candidate text; then answer screening processing is performed on the question label vector, candidate text and candidate label vector, and the confidence degree used to represent the probability of each candidate text containing the candidate answer is obtained , and finally determine the candidate positions according to the correlation value, the confidence level and the preset prediction threshold, so as to determine the candidate answers. Through the attention screening mechanism, association processing, and answer screening processing, useless text information that has nothing to do with the answer can be deleted, and the part related to the question in the article can be effectively selected, thereby improving the accuracy of the predicted answer.
请参照图2,在一些实施例中,原始文章数据包括多个原始文本,每一原始文本包括多个文本单词,原始问题数据包括多个问题单词。Referring to FIG. 2 , in some embodiments, the original article data includes a plurality of original texts, each original text includes a plurality of text words, and the original question data includes a plurality of question words.
在一些实施例的步骤S300包括步骤S310、步骤S320和步骤S330,下面结合图2对这两个步骤进行详细介绍,应理解,步骤S300包括但不限于步骤S310至步骤S330。In some embodiments, step S300 includes step S310, step S320, and step S330. These two steps will be described in detail below with reference to FIG. 2. It should be understood that step S300 includes but not limited to step S310 to step S330.
步骤S310:根据预设的第一注意力模型对问题编码向量和文章编码向量进行注意力运算,得到注意力矩阵;其中,注意力矩阵包括多个注意力值,每一注意力值用于表征每一文本单词对问题单词的重要程度。Step S310: Perform attention operation on the question encoding vector and the article encoding vector according to the preset first attention model to obtain an attention matrix; wherein, the attention matrix includes a plurality of attention values, and each attention value is used to represent How important each text word is to the question word.
在一些实施例的步骤S310中,通过上述步骤S200对原始问题数据和原始文章数据编码处理,得到问题编码向量H Q和文章编码向量H P后,通过预设的第一注意力模型对问题编码向量H Q和文章编码向量H P件注意力运算。其中,第一注意力模型的结构可以采取match attention的方式,match attention如公式(2)所示,公式(2)具体如下: In step S310 of some embodiments, the original question data and original article data are encoded through the above step S200, and after obtaining the question encoding vector H Q and the article encoding vector H P , the question is encoded by the preset first attention model Vector H Q and article encoding vector H P piece attention operation. Among them, the structure of the first attention model can adopt the method of match attention, and the match attention is shown in the formula (2), and the formula (2) is as follows:
Figure PCTCN2022090750-appb-000001
Figure PCTCN2022090750-appb-000001
在公式(2)中,A表示计算得到的注意力矩阵,SoftMax表示softmax函数,W是权重矩阵,b表示偏置,
Figure PCTCN2022090750-appb-000002
表示内积操作,e是单位向量。通过公式(2)能够计算得到问题编码向量H Q和文章编码向量H P的注意力矩阵。
In formula (2), A represents the calculated attention matrix, SoftMax represents the softmax function, W is the weight matrix, b represents the bias,
Figure PCTCN2022090750-appb-000002
Indicates the inner product operation, and e is a unit vector. The attention matrix of the question encoding vector H Q and the article encoding vector H P can be calculated by formula (2).
需要说明的是,第一注意力模型的结构也可以采用cross attention的运算方法,其更新公式类似于transformer的self-attention,也是采用Q、K、V三个矩阵来运算注意力结果,不同的是,在此以H Q来计算K和V,以H P来计算Q。但是,最终运算的注意力结果和match attention的格式相同。 It should be noted that the structure of the first attention model can also use the calculation method of cross attention. Its update formula is similar to the self-attention of the transformer, and it also uses three matrices of Q, K, and V to calculate the attention results. Different Yes, here K and V are calculated with H Q , and Q is calculated with HP . However, the attention result of the final operation has the same format as the match attention.
在本实施例中,注意力矩阵A的维度和H P(H Q) T的维度一致,注意力矩阵A记录了原始问题数据的每一个token embedding在原始文章数据上的注意力情况。即A i,j就可以表征原始文章数据中的第j个文本单词在原始问题数据的第i个问题单词的重要程度,那么
Figure PCTCN2022090750-appb-000003
就可以表征原始文章数据中的第j个文本单词在整个原始问题数据的重要程度。
In this embodiment, the dimension of the attention matrix A is consistent with the dimension of HP (H Q ) T , and the attention matrix A records the attention situation of each token embedding of the original question data on the original article data. That is, A i,j can represent the importance of the j-th text word in the original article data in the i-th question word of the original question data, then
Figure PCTCN2022090750-appb-000003
It can characterize the importance of the jth text word in the original article data in the entire original question data.
步骤S320:获取预设的第一注意力阈值。Step S320: Obtain a preset first attention threshold.
步骤S330:根据第一注意力阈值和注意力矩阵对原始文本进行筛选处理,得到多个候选文本。Step S330: Screen the original text according to the first attention threshold and the attention matrix to obtain multiple candidate texts.
在一些实施例的步骤S330中,由于A i,j就可以表征原始文章数据中的第j个文本单词在原始问题数据的第i个问题单词的重要程度,那么可以根据第一注意力阈值对原始文本进行筛选,以得到多个候选文本,在此以P A表示候选文本。 In step S330 of some embodiments, since A i,j can represent the importance of the jth text word in the original article data in the i'th question word of the original question data, then the first attention threshold can be used for The original text is screened to obtain multiple candidate texts, where PA represents the candidate texts.
参照图3,在一些实施例中,步骤S330包括但不限于步骤S331和步骤S332,下面结合图3对这两个步骤进行详细描述。Referring to FIG. 3 , in some embodiments, step S330 includes but not limited to step S331 and step S332 , which will be described in detail below in conjunction with FIG. 3 .
步骤S331:根据注意力矩阵计算同一文本单词对原始问题数据的注意力值,得到对应的文本注意力值;其中,文本注意力值用于表征文本单词对原始问题数据的重要程度。Step S331: Calculate the attention value of the same text word on the original question data according to the attention matrix, and obtain the corresponding text attention value; wherein, the text attention value is used to represent the importance of the text word to the original question data.
步骤S332:若文本注意力值大于第一注意力阈值,则获取文本单词对应的原始文本,以得到对应的候选文本。Step S332: If the text attention value is greater than the first attention threshold, obtain the original text corresponding to the text word to obtain the corresponding candidate text.
在一些实施例,若文本注意力值小于或者等于预设的第一注意力阈值,说明该文本单词对于原始问题数据来说的重要性不够,将其判定为无用的文本信息。In some embodiments, if the text attention value is less than or equal to the preset first attention threshold, it means that the text word is not important enough for the original question data, and it is judged as useless text information.
具体地,在步骤S310中提到,A i,j就可以表征原始文章数据中的第j个文本单词在原始问题数据的第i个问题单词的重要程度,那么
Figure PCTCN2022090750-appb-000004
就可以表征原始文章数据中的第j个文本单词在整个原始问题数据的重要程度。在此,计算同一个文本单词对原始问题数据的注意力值,得到对应的文本注意力值,即A *j,如果A *j大于预设的第一注意力阈值,则获取该文本单词对应的原始文本,并将该原始文本作为多个候选文本中的候选文本之一。重复此操作,得到多个候选文本。
Specifically, as mentioned in step S310, A i,j can represent the importance of the jth text word in the original article data in the i'th question word of the original question data, then
Figure PCTCN2022090750-appb-000004
It can characterize the importance of the jth text word in the original article data in the entire original question data. Here, the attention value of the same text word to the original question data is calculated to obtain the corresponding text attention value, that is, A *j . If A *j is greater than the preset first attention threshold, the text word corresponding to , and use the original text as one of the candidate texts in multiple candidate texts. Repeat this operation to get multiple candidate texts.
参照图4,在一些实施例中,步骤S400包括步骤S410、步骤S420和步骤S430。下面对这三个步骤进行详细介绍,应理解,步骤S400包括但不限于步骤S410至步骤S430。Referring to FIG. 4 , in some embodiments, step S400 includes step S410 , step S420 and step S430 . These three steps are described in detail below, it should be understood that step S400 includes but not limited to step S410 to step S430.
步骤S410:将原始问题数据和每一候选文本输入到预设的第二预训练模型中;其中,第二预训练模型包括第一神经网络和第二神经网络。Step S410: Input the original question data and each candidate text into a preset second pre-training model; wherein, the second pre-training model includes a first neural network and a second neural network.
步骤S420:通过第一神经网络对原始问题数据和每一候选文本进行分类标记处理,得到问题标记向量和每一候选文本的候选标记向量。Step S420: Classifying and labeling the original question data and each candidate text through the first neural network to obtain a question label vector and a candidate label vector for each candidate text.
步骤S430:通过第二神经网络对问题标记向量和每一候选标记向量进行映射分类处理,得到对应的关联值。Step S430: Perform mapping and classification processing on the question label vector and each candidate label vector through the second neural network to obtain corresponding correlation values.
具体地,在本实施例中,第二预训练模型可以再次采取BERT模型,但是该BERT模型的参数与前述的第一预训练模型的BERT模型存在差别,若采取BERT模型,则第一神经网络为BERT的CLS层。将原始问题数据H Q和候选文本P A输入到BERT模型中,并获取最后 一层CLS的结果,得到问题标记向量和候选标记向量。然后将问题标记向量和候选标记向量输入到第二神经网络进行精调、映射分类处理,得到候选文本对应的关联值Score 1。关联值Score 1的得分在0到1之间,分数越大说明候选答案在此候选文本的概率越大,关联值Score 1判定的是原始问题数据和候选文本的关联性。 Specifically, in this embodiment, the second pre-training model can adopt the BERT model again, but the parameters of the BERT model are different from the BERT model of the aforementioned first pre-training model. If the BERT model is adopted, the first neural network It is the CLS layer of BERT. Input the original question data H Q and candidate text P A into the BERT model, and obtain the result of the last layer of CLS, and obtain the question mark vector and candidate mark vector. Then, the question mark vector and the candidate mark vector are input to the second neural network for fine-tuning, mapping and classification processing, and the associated value Score 1 corresponding to the candidate text is obtained. The score of the correlation value Score 1 is between 0 and 1. The higher the score, the greater the probability that the candidate answer is in the candidate text. The correlation value Score 1 determines the relevance between the original question data and the candidate text.
参照图5,在一些实施例中,第二神经网络包括全连接层和激活分类层。Referring to Figure 5, in some embodiments, the second neural network includes a fully connected layer and an activation classification layer.
步骤S430包括步骤S431和步骤S432,下面对这两个步骤进行详细介绍,应理解,步骤S430包括但不限于步骤S431和步骤S432。Step S430 includes step S431 and step S432, and these two steps will be described in detail below, it should be understood that step S430 includes but not limited to step S431 and step S432.
步骤S431:通过全连接层对问题标记向量和每一候选标记向量进行全连接处理,得到对应的全连接值。Step S431: Perform fully-connected processing on the question mark vector and each candidate mark vector through the fully-connected layer to obtain corresponding fully-connected values.
步骤S432:通过激活分类层对全连接值进行激活分类处理,得到对应的关联值。Step S432: Perform activation classification processing on the fully-connected values through the activation classification layer to obtain corresponding associated values.
具体地,在本实施例中,第二神经网络包括全连接层和激活分类层,激活分类层为sigmod函数。将BERT模型最后一层CLS的结果接入全连接网络进行精调,得到全连接值,然后将该全连接值通过一个sigmod函数层输出判断分数,得到关联值。Specifically, in this embodiment, the second neural network includes a fully connected layer and an activation classification layer, and the activation classification layer is a sigmod function. The result of the last layer of CLS of the BERT model is connected to the fully connected network for fine adjustment to obtain the fully connected value, and then the fully connected value is passed through a sigmod function layer to output the judgment score to obtain the associated value.
参照图6,在一些实施例中,步骤S500包括步骤S510、步骤S520和步骤S530。应理解,步骤S500包括但不限于这三个步骤,下面对这三个步骤进行详细介绍。Referring to FIG. 6 , in some embodiments, step S500 includes step S510 , step S520 and step S530 . It should be understood that step S500 includes but is not limited to these three steps, which will be described in detail below.
步骤S510:通过预设的第二注意力模型对问题标记向量和每一候选标记向量进行注意力筛选处理,得到多个待检测文本。Step S510: Perform attention screening on the question label vector and each candidate label vector by using the preset second attention model to obtain a plurality of texts to be detected.
步骤S520:根据待检测文本和候选标记向量得到与待检测文本相对应的待检测向量。Step S520: Obtain a vector to be detected corresponding to the text to be detected according to the text to be detected and the candidate marker vectors.
步骤S530:通过预设的答案预测模型对待检测文本和待检测向量进行筛选预测处理,得到待检测文本对应的置信度。Step S530: Perform screening and prediction processing on the text to be detected and the vector to be detected through the preset answer prediction model to obtain the corresponding confidence level of the text to be detected.
具体地,在本实施例中,第二注意力模型的结构可以和前述的第一注意力模型的结构一致,也可以不一致。不论二者的结构是否一致,注意力结果的运算过程都是类似的。将问题标记向量和候选标记向量继续通过BERT编码处理,然后经过一层注意力结构模型进行筛选处理,得到多个待检测文本。然后根据待检测文本从候选标记向量中进行匹配处理,得到与待检测文本对应的待检测向量,最后,将待检测向量和待检测文本输入到答案预测模型中,得到待检测文本对应的置信度。Specifically, in this embodiment, the structure of the second attention model may or may not be consistent with the structure of the foregoing first attention model. Regardless of whether the structure of the two is consistent, the calculation process of the attention result is similar. The question mark vector and the candidate mark vector continue to be processed by BERT encoding, and then screened through a layer of attention structure model to obtain multiple texts to be detected. Then, according to the text to be detected, the matching process is performed from the candidate tag vector to obtain the vector to be detected corresponding to the text to be detected, and finally, the vector to be detected and the text to be detected are input into the answer prediction model to obtain the confidence corresponding to the text to be detected .
参照图7,在一些实施例中,答案预测模型包括第一全连接多头网络和第二全连接多头网络。步骤S530包括但不限于步骤S531、步骤S532和步骤S533。Referring to FIG. 7 , in some embodiments, the answer prediction model includes a first fully connected multi-head network and a second fully connected multi-head network. Step S530 includes but not limited to step S531, step S532 and step S533.
步骤S531:通过第一全连接多头网络对待检测文本和待检测向量进行起始预测处理,得到待检测文本的起始预测位置和待检测向量的起始标记位置;Step S531: Perform initial prediction processing on the text to be detected and the vector to be detected through the first fully connected multi-head network, to obtain the initial prediction position of the text to be detected and the initial mark position of the vector to be detected;
步骤S532:通过第二全连接多头网络对待检测文本和待检测向量进行结束预测处理,得到待检测文本的结束预测位置和待检测向量的结束标记位置。Step S532: Perform end prediction processing on the text to be detected and the vector to be detected through the second fully connected multi-head network to obtain the predicted end position of the text to be detected and the end mark position of the vector to be detected.
步骤S533:根据起始预测位置、起始标记位置、结束预测位置和结束标记位置得到待检测文本对应的置信度。Step S533: Obtain the confidence corresponding to the text to be detected according to the start prediction position, the start mark position, the end prediction position and the end mark position.
在本实施例中,将候选文本P A对应的隐藏结果输入到两个结构相同但参数不同的全连接多头网络(第一全连接多头网络和第二全连接多头网络)中,这两个全连接多头网络分别用来预测候选答案的开始位置和结束位置。判断开始位置的多头结果用s i表示,判断结束位置的多头结果用e i表示。将两个全连接多头网络的最大值下标分别记为start和end,那么候选答案
Figure PCTCN2022090750-appb-000005
In this embodiment, the hidden results corresponding to the candidate text PA are input into two fully connected multi-head networks (the first fully connected multi-head network and the second fully connected multi-head network) with the same structure but different parameters. The multi-head network is connected to predict the start position and end position of candidate answers respectively. The long result of judging the start position is represented by si , and the long result of judging the end position is represented by e i . Mark the maximum subscripts of the two fully connected multi-head networks as start and end respectively, then the candidate answer
Figure PCTCN2022090750-appb-000005
具体为:通过第一全连接多头网络对待检测文本和待检测向量进行起始预测处理,得到待检测文本的起始预测位置和待检测向量的起始标记位置,通过第二全连接多头网络对待检测文本和待检测向量进行结束预测处理,得到待检测文本的结束预测位置和待检测向量的结束标记位置。用S start表示起始预测位置,S 1表示起始标记位置,e end表示结束预测位置,e 1表示结束标记位置,那么对应的置信度Score 2可以用公式(3)来表示,公式(3)具体为: Specifically, the text to be detected and the vector to be detected are subjected to initial prediction processing through the first fully-connected multi-head network, and the initial prediction position of the text to be detected and the initial mark position of the vector to be detected are obtained, and treated by the second fully-connected multi-head network The detected text and the vector to be detected are subjected to end prediction processing to obtain the predicted end position of the text to be detected and the end mark position of the vector to be detected. Use S start to represent the start prediction position, S 1 to represent the start mark position, e end to represent the end prediction position, e 1 to represent the end mark position, then the corresponding confidence Score 2 can be expressed by formula (3), formula (3 )Specifically:
Score 2=(S start-S 1)+(e end-e 1)     (3) Score 2 =(S start -S 1 )+(e end -e 1 ) (3)
Score 2与关联值Score 1的目标类似,但是出发点不同,置信度Score 2用来表征对抽取出来的答案的置信情况。得到Score 1和Score 2后,再根据Score 1和Score 2得到目标分数值Score,如果目标分数值Score大于预设的预测阈值,则以前述得到的A candidate作为候选答案,如果目标分数值Score小于预设的预测阈值,则说明没有答案,输出一个空字符。 The goal of Score 2 is similar to that of Score 1 , but the starting point is different. Confidence Score 2 is used to represent the confidence of the extracted answer. After Score 1 and Score 2 are obtained, the target score value Score is obtained according to Score 1 and Score 2. If the target score value Score is greater than the preset prediction threshold, the A candidate obtained above is used as the candidate answer. If the target score value Score is less than The preset prediction threshold means that there is no answer, and a null character is output.
请参照图8,第二方面,本申请的一些实施例还提出了一种问题答案的预测装置800,包括获取模块810、编码模块820、注意力筛选模块830、关联模块840、答案筛选模块850、处理模块860和匹配模块870。Please refer to FIG. 8 , in the second aspect, some embodiments of the present application also propose a prediction device 800 for question answers, including an acquisition module 810, an encoding module 820, an attention screening module 830, an association module 840, and an answer screening module 850 , a processing module 860 and a matching module 870.
获取模块810,用于获取待预测的原始题目数据;原始题目数据包括原始文章数据和待回答的原始问题数据。The obtaining module 810 is used to obtain the original topic data to be predicted; the original topic data includes original article data and original question data to be answered.
编码模块820,用于根据预设的第一预训练模型对原始文章数据和原始问题数据进行编码处理,得到问题编码向量和文章编码向量。The encoding module 820 is configured to encode the original article data and the original question data according to the preset first pre-training model to obtain question encoding vectors and article encoding vectors.
注意力筛选模块830,用于对问题编码向量和文章编码向量进行注意力筛选处理,得到多个候选文本;The attention screening module 830 is used to perform attention screening processing on the question coding vector and the article coding vector to obtain a plurality of candidate texts;
关联模块840,用于对原始问题数据和每一候选文本进行关联处理,得到关联数据;其中关联数据包括问题标记向量、对应每一候选文本的候选标记向量、关联值;其中,关联值用于表征原始问题数据和每一候选文本之间的关联性。The association module 840 is used to associate the original question data with each candidate text to obtain associated data; wherein the associated data includes a question mark vector, a candidate mark vector corresponding to each candidate text, and an associated value; wherein, the associated value is used for Characterize the correlation between the original question data and each candidate text.
答案筛选模块850,用于对问题标记向量、多个候选文本和每一候选标记向量进行答案筛选处理,得到每一候选文本对应的置信度;其中,置信度用于表征候选文本包含候选答案的概率。The answer screening module 850 is used to perform answer screening processing on the question mark vector, a plurality of candidate texts and each candidate mark vector, to obtain a confidence degree corresponding to each candidate text; wherein, the confidence degree is used to represent whether the candidate text contains the candidate answer probability.
处理模块860,用于根据关联值、置信度和预设的预测阈值确定候选位置;其中候选位置为候选答案所处位置。The processing module 860 is configured to determine the candidate position according to the correlation value, the confidence level and the preset prediction threshold; wherein the candidate position is the position of the candidate answer.
匹配模块870,用于根据候选位置匹配对应的候选文本,得到候选答案。The matching module 870 is configured to match corresponding candidate texts according to candidate positions to obtain candidate answers.
本申请实施例的问题答案的预测装置800,通过预设的第一预训练模型对原始题目数据中的原始文章数据和原始问题数据进行编码处理,得到问题编码向量和文章编码向量,然后对问题编码向量和文章编码向量进行注意力筛选处理,以筛选得到多个候选文本,然后对原始问题数据和每一个候选文本进行关联处理,得到用于表征原始问题数据和每一候选文本关联性的关联值、原始问题数据对应的问题标记向量和候选文本对应的候选标记向量;再对问题标记向量、候选文本和候选标记向量进行答案筛选处理,得到用于表征每一候选本文包含候选答案概率的置信度,最后根据关联值、置信度和预设的预测阈值确定候选位置,从而确定候选答案。通过注意力筛选机制、关联处理、和答案筛选处理,能够将与答案毫无关系的无用文本信息删除,有效地选择出文章中和问题有关的部分,从而提高预测答***性。The question answer prediction device 800 of the embodiment of the present application encodes the original article data and original question data in the original question data through the preset first pre-training model to obtain the question encoding vector and the article encoding vector, and then the question The encoding vector and the article encoding vector are subjected to attention screening processing to screen multiple candidate texts, and then the original question data and each candidate text are associated with each other to obtain the association used to characterize the relevance of the original question data and each candidate text value, the question mark vector corresponding to the original question data, and the candidate mark vector corresponding to the candidate text; then answer screening process is performed on the question mark vector, candidate text and candidate mark vector, and the confidence used to characterize the probability of each candidate text containing the candidate answer is obtained degree, and finally determine the candidate position according to the correlation value, confidence degree and preset prediction threshold, so as to determine the candidate answer. Through the attention screening mechanism, association processing, and answer screening processing, useless text information that has nothing to do with the answer can be deleted, and the part related to the question in the article can be effectively selected, thereby improving the accuracy of the predicted answer.
需要说明的是,本申请实施例的问题答案的预测装置与前述的问题答案的预测方法相对应,具体的预测过程请参照前述的问题答案的预测方法,在此不再赘述。It should be noted that the device for predicting the answer to the question in the embodiment of the present application corresponds to the method for predicting the answer to the question mentioned above. For the specific prediction process, please refer to the method for predicting the answer to the question mentioned above, which will not be repeated here.
本申请实施例还提供了一种电子设备,包括:The embodiment of the present application also provides an electronic device, including:
至少一个存储器;at least one memory;
至少一个处理器;at least one processor;
至少一个程序;at least one program;
程序被存储在存储器中,处理器执行至少一个程序以实现本申请实施一种问题答案的预测方法,其中,问题答案的预测方法包括:获取待预测的原始题目数据;其中,原始题目数据包括原始文章数据和待回答的原始问题数据;根据预设的第一预训练模型对原始文章数据和原始问题数据进行编码处理,得到问题编码向量和文章编码向量;对问题编码向量和文章编码向量进行注意力筛选处理,得到多个候选文本;对原始问题数据和每一候选文本进行关联处理,得到关联数据;其中关联数据包括问题标记向量、对应每一候选文本的候选标记向 量、关联值;其中,关联值用于表征原始问题数据和每一候选文本之间的关联性;对问题标记向量、多个候选文本和每一候选标记向量进行答案筛选处理,得到每一候选文本对应的置信度;其中,置信度用于表征候选文本包含候选答案的概率;根据关联值、置信度和预设的预测阈值确定候选位置;其中候选位置为候选答案所处位置;根据候选位置匹配对应的候选文本,得到候选答案。该电子设备可以为包括手机、平板电脑、个人数字助理(Personal Digital Assistant,PDA)、车载电脑等任意智能终端。The program is stored in the memory, and the processor executes at least one program to implement a method for predicting the answer to the question in the present application, wherein the method for predicting the answer to the question includes: obtaining the original question data to be predicted; where the original question data includes the original Article data and original question data to be answered; encode the original article data and original question data according to the preset first pre-training model to obtain question encoding vectors and article encoding vectors; pay attention to the question encoding vectors and article encoding vectors The original question data is associated with each candidate text to obtain associated data; wherein the associated data includes question mark vectors, candidate mark vectors corresponding to each candidate text, and associated values; among them, The correlation value is used to represent the correlation between the original question data and each candidate text; the answer screening process is performed on the question mark vector, multiple candidate texts and each candidate mark vector to obtain the corresponding confidence of each candidate text; where , the confidence degree is used to characterize the probability that the candidate text contains the candidate answer; the candidate position is determined according to the correlation value, the confidence degree and the preset prediction threshold; the candidate position is the position of the candidate answer; the corresponding candidate text is matched according to the candidate position, and obtained candidate answer. The electronic device may be any intelligent terminal including a mobile phone, a tablet computer, a personal digital assistant (Personal Digital Assistant, PDA), a vehicle-mounted computer, and the like.
下面结合图9对本申请实施例的电子设备进行详细介绍。The electronic device according to the embodiment of the present application will be described in detail below with reference to FIG. 9 .
如图9,图9示意了另一实施例的电子设备的硬件结构,电子设备包括:As shown in Figure 9, Figure 9 illustrates the hardware structure of an electronic device in another embodiment, the electronic device includes:
处理器910,可以采用通用的中央处理器(Central Processing Unit,CPU)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本申请实施例所提供的技术方案;The processor 910 may be implemented by a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute Relevant programs to realize the technical solutions provided by the embodiments of the present application;
存储器920,可以采用只读存储器(Read Only Memory,ROM)、静态存储设备、动态存储设备或者随机存取存储器(Random Access Memory,RAM)等形式实现。存储器920可以存储操作***和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器920中,并由处理器910来调用执行本申请实施例的问题答案的预测方法;The memory 920 may be implemented in the form of a read-only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device, or a random access memory (Random Access Memory, RAM). The memory 920 can store operating systems and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 920 and called by the processor 910 to execute the implementation of the present application. A method of predicting the answer to the question of the example;
输入/输出接口930,用于实现信息输入及输出;The input/output interface 930 is used to realize information input and output;
通信接口940,用于实现本设备与其他设备的通信交互,可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信;The communication interface 940 is used to realize the communication interaction between the device and other devices, and the communication can be realized through a wired method (such as USB, network cable, etc.), or can be realized through a wireless method (such as a mobile network, WIFI, Bluetooth, etc.);
总线950,在设备的各个组件(例如处理器910、存储器920、输入/输出接口930和通信接口940)之间传输信息;bus 950, to transfer information between various components of the device (eg, processor 910, memory 920, input/output interface 930, and communication interface 940);
其中处理器910、存储器920、输入/输出接口930和通信接口940通过总线950实现彼此之间在设备内部的通信连接。The processor 910 , the memory 920 , the input/output interface 930 and the communication interface 940 are connected to each other within the device through the bus 950 .
本申请实施例还提供了一种存储介质,该存储介质是计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令用于使计算机执行一种问题答案的预测方法,其中,问题答案的预测方法包括:获取待预测的原始题目数据;其中,原始题目数据包括原始文章数据和待回答的原始问题数据;根据预设的第一预训练模型对原始文章数据和原始问题数据进行编码处理,得到问题编码向量和文章编码向量;对问题编码向量和文章编码向量进行注意力筛选处理,得到多个候选文本;对原始问题数据和每一候选文本进行关联处理,得到关联数据;其中关联数据包括问题标记向量、对应每一候选文本的候选标记向量、关联值;其中,关联值用于表征原始问题数据和每一候选文本之间的关联性;对问题标记向量、多个候选文本和每一候选标记向量进行答案筛选处理,得到每一候选文本对应的置信度;其中,置信度用于表征候选文本包含候选答案的概率;根据关联值、置信度和预设的预测阈值确定候选位置;其中候选位置为候选答案所处位置;根据候选位置匹配对应的候选文本,得到候选答案。该电子设备可以为包括手机、平板电脑、个人数字助理(Personal Digital Assistant,PDA)、车载电脑等任意智能终端。。The embodiment of the present application also provides a storage medium, the storage medium is a computer-readable storage medium, and the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to make a computer execute a method of answering a question. The prediction method, wherein, the prediction method of the question answer includes: obtaining the original topic data to be predicted; wherein, the original topic data includes the original article data and the original question data to be answered; according to the preset first pre-training model, the original article data Perform encoding processing with the original question data to obtain the question encoding vector and the article encoding vector; perform attention screening processing on the question encoding vector and the article encoding vector to obtain multiple candidate texts; perform association processing on the original question data and each candidate text, Obtain associated data; wherein the associated data includes a question mark vector, a candidate mark vector corresponding to each candidate text, and an associated value; wherein, the associated value is used to characterize the relevance between the original question data and each candidate text; the question mark vector , a plurality of candidate texts and each candidate tag vector to perform answer screening processing to obtain the corresponding confidence level of each candidate text; wherein, the confidence level is used to represent the probability that the candidate text contains the candidate answer; according to the associated value, confidence level and preset The prediction threshold of is used to determine the candidate position; where the candidate position is the position of the candidate answer; match the corresponding candidate text according to the candidate position to obtain the candidate answer. The electronic device may be any intelligent terminal including a mobile phone, a tablet computer, a personal digital assistant (Personal Digital Assistant, PDA), a vehicle-mounted computer, and the like. .
该计算机可读存储介质可以是非易失性,也可以是易失性。存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可选包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The computer readable storage medium can be nonvolatile or volatile. As a non-transitory computer-readable storage medium, memory can be used to store non-transitory software programs and non-transitory computer-executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
本申请实施例描述的实施例是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域技术人员可知,随着技术的演变和新应用场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments described in the embodiments of the present application are to illustrate the technical solutions of the embodiments of the present application more clearly, and do not constitute a limitation to the technical solutions provided by the embodiments of the present application. Those skilled in the art know that with the evolution of technology and new For the emergence of application scenarios, the technical solutions provided by the embodiments of the present application are also applicable to similar technical problems.
本领域技术人员可以理解的是,图中示出的技术方案并不构成对本申请实施例的限定,可以包括比图示更多或更少的步骤,或者组合某些步骤,或者不同的步骤。Those skilled in the art can understand that the technical solution shown in the figure does not constitute a limitation to the embodiment of the present application, and may include more or less steps than those shown in the figure, or combine some steps, or different steps.
以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、***、设备中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。Those of ordinary skill in the art can understand that all or some of the steps in the methods disclosed above, the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof.
本申请的说明书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、***、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the description of the present application and the above drawings are used to distinguish similar objects and not necessarily to describe specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。It should be understood that in this application, "at least one (item)" means one or more, and "multiple" means two or more. "And/or" is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, "A and/or B" can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c ", where a, b, c can be single or multiple.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。A unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括多指令用以使得一台电子设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including multiple instructions to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store programs.
以上参照附图说明了本申请实施例的优选实施例,并非因此局限本申请实施例的权利范围。本领域技术人员不脱离本申请实施例的范围和实质内所作的任何修改、等同替换和改进,均应在本申请实施例的权利范围之内。The preferred embodiments of the embodiments of the present application have been described above with reference to the accompanying drawings, which does not limit the scope of rights of the embodiments of the present application. Any modifications, equivalent replacements and improvements made by those skilled in the art without departing from the scope and essence of the embodiments of the present application shall fall within the scope of rights of the embodiments of the present application.

Claims (20)

  1. 一种问题答案的预测方法,其中,包括:A method for predicting answers to questions, comprising:
    获取待预测的原始题目数据;其中,所述原始题目数据包括原始文章数据和待回答的原始问题数据;Obtaining the original topic data to be predicted; wherein, the original topic data includes original article data and original question data to be answered;
    根据预设的第一预训练模型对所述原始文章数据和所述原始问题数据进行编码处理,得到问题编码向量和文章编码向量;performing encoding processing on the original article data and the original question data according to a preset first pre-training model to obtain a question encoding vector and an article encoding vector;
    对所述问题编码向量和所述文章编码向量进行注意力筛选处理,得到多个候选文本;performing attention screening on the question code vector and the article code vector to obtain a plurality of candidate texts;
    对所述原始问题数据和每一所述候选文本进行关联处理,得到关联数据;其中所述关联数据包括问题标记向量、对应每一所述候选文本的候选标记向量、关联值;其中,所述关联值用于表征所述原始问题数据和每一所述候选文本之间的关联性;performing associative processing on the original question data and each of the candidate texts to obtain associated data; wherein the associated data includes question mark vectors, candidate mark vectors corresponding to each of the candidate texts, and associated values; wherein, the The association value is used to characterize the association between the original question data and each of the candidate texts;
    对所述问题标记向量、所述多个候选文本和每一所述候选标记向量进行答案筛选处理,得到每一所述候选文本对应的置信度;其中,所述置信度用于表征所述候选文本包含候选答案的概率;performing answer screening processing on the question mark vector, the plurality of candidate texts, and each of the candidate mark vectors to obtain a confidence degree corresponding to each of the candidate texts; wherein the confidence degree is used to characterize the candidate the probability that the text contains the candidate answer;
    根据所述关联值、所述置信度和预设的预测阈值确定候选位置;其中所述候选位置为所述候选答案所处位置;Determine a candidate position according to the correlation value, the confidence level and a preset prediction threshold; wherein the candidate position is the position of the candidate answer;
    根据所述候选位置匹配对应的候选文本,得到所述候选答案。Matching corresponding candidate texts according to the candidate positions to obtain the candidate answers.
  2. 根据权利要求1所述的方法,其中,所述原始文章数据包括多个原始文本,所述原始文本包括多个文本单词,所述原始问题数据包括多个问题单词;The method according to claim 1, wherein the original article data includes a plurality of original texts, the original text includes a plurality of text words, and the original question data includes a plurality of question words;
    所述对所述问题编码向量和所述文章编码向量进行注意力筛选处理,得到多个候选文本,包括:The attention screening process is performed on the question encoding vector and the article encoding vector to obtain a plurality of candidate texts, including:
    根据预设的第一注意力模型对所述问题编码向量和文章编码向量进行注意力运算,得到注意力矩阵;其中,所述注意力矩阵包括多个注意力值,每一所述注意力值用于表征每一所述文本单词对所述问题单词的重要程度;According to the preset first attention model, the attention operation is performed on the question encoding vector and the article encoding vector to obtain an attention matrix; wherein, the attention matrix includes a plurality of attention values, each of the attention values Used to characterize the importance of each of the text words to the question word;
    获取预设的第一注意力阈值;Obtain a preset first attention threshold;
    根据所述第一注意力阈值和所述注意力矩阵对所述原始文本进行筛选处理,得到多个候选文本。The original text is screened according to the first attention threshold and the attention matrix to obtain multiple candidate texts.
  3. 根据权利要求2所述的方法,其中,所述根据所述第一注意力阈值和所述注意力矩阵对所述原始文本进行筛选处理,得到多个候选文本,包括:The method according to claim 2, wherein said screening the original text according to the first attention threshold and the attention matrix to obtain a plurality of candidate texts, comprising:
    根据所述注意力矩阵计算同一所述文本单词对所述原始问题数据的注意力值,得到对应的文本注意力值;其中,所述文本注意力值用于表征所述文本单词对所述原始问题数据的重要程度;According to the attention matrix, the attention value of the same text word to the original question data is calculated to obtain the corresponding text attention value; wherein, the text attention value is used to characterize the text word to the original the importance of the problem data;
    若所述文本注意力值大于所述第一注意力阈值,则获取所述文本单词对应的原始文本,以得到对应的候选文本。If the text attention value is greater than the first attention threshold, the original text corresponding to the text word is obtained to obtain the corresponding candidate text.
  4. 根据权利要求1所述的方法,其中,所述对所述原始问题数据和每一所述候选文本进行关联处理,得到关联数据,包括:The method according to claim 1, wherein said associating said original question data with each of said candidate texts to obtain associated data comprises:
    将所述原始问题数据和每一所述候选文本输入到预设的第二预训练模型中;其中,所述第二预训练模型包括第一神经网络和第二神经网络;Inputting the original question data and each of the candidate texts into a preset second pre-training model; wherein, the second pre-training model includes a first neural network and a second neural network;
    通过所述第一神经网络对所述原始问题数据和每一所述候选文本进行分类标记处理,得到问题标记向量和每一所述候选文本的候选标记向量;Classifying and labeling the original question data and each of the candidate texts through the first neural network to obtain a question label vector and a candidate label vector for each of the candidate texts;
    通过所述第二神经网络对所述问题标记向量和每一所述候选标记向量进行映射分类处理,得到对应的关联值。Mapping and classifying the question mark vector and each of the candidate mark vectors through the second neural network to obtain corresponding correlation values.
  5. 根据权利要求4所述的方法,其中,所述第二神经网络包括全连接层和激活分类层;The method of claim 4, wherein the second neural network comprises a fully connected layer and an activation classification layer;
    所述通过所述第二神经网络对所述问题标记向量和每一所述候选标记向量进行映射分类处理,得到对应的关联值,包括:The second neural network performs mapping and classification processing on the question mark vector and each of the candidate mark vectors to obtain corresponding correlation values, including:
    通过所述全连接层对所述问题标记向量和每一所述候选标记向量进行全连接处理,得到对应的全连接值;Performing a fully connected process on the question mark vector and each of the candidate mark vectors through the fully connected layer to obtain a corresponding fully connected value;
    通过所述激活分类层对所述全连接值进行激活分类处理,得到对应的关联值。The activation classification process is performed on the fully-connected values through the activation classification layer to obtain corresponding associated values.
  6. 根据权利要求1至5任意一项所述的方法,其中,所述对所述问题标记向量、所述多个候选文本和每一所述候选标记向量进行答案筛选处理,得到每一所述候选文本对应的置信度,包括:The method according to any one of claims 1 to 5, wherein the answer screening process is performed on the question mark vector, the plurality of candidate texts and each of the candidate mark vectors to obtain each of the candidate The confidence level corresponding to the text, including:
    通过预设的第二注意力模型对所述问题标记向量和每一所述候选标记向量进行注意力筛选处理,得到多个待检测文本;performing an attention screening process on the question mark vector and each of the candidate mark vectors through a preset second attention model to obtain a plurality of texts to be detected;
    根据所述待检测文本和所述候选标记向量得到与所述待检测文本相对应的待检测向量;Obtaining a vector to be detected corresponding to the text to be detected according to the text to be detected and the candidate tag vector;
    通过预设的答案预测模型对所述待检测文本和所述待检测向量进行筛选预测处理,得到所述待检测文本对应的置信度。The text to be detected and the vector to be detected are screened and predicted by a preset answer prediction model to obtain a confidence degree corresponding to the text to be detected.
  7. 根据权利要求6所述的方法,其中,所述答案预测模型包括第一全连接多头网络和第二全连接多头网络;The method according to claim 6, wherein the answer prediction model comprises a first fully connected multi-head network and a second fully connected multi-head network;
    所述通过预设的答案预测模型对所述待检测文本和所述待检测向量进行筛选预测处理,得到所述待检测文本对应的置信度,包括:The step of screening and predicting the text to be detected and the vector to be detected through the preset answer prediction model to obtain the confidence corresponding to the text to be detected includes:
    通过所述第一全连接多头网络对所述待检测文本和所述待检测向量进行起始预测处理,得到待检测文本的起始预测位置和待检测向量的起始标记位置;Perform initial prediction processing on the text to be detected and the vector to be detected through the first fully connected multi-head network to obtain the initial prediction position of the text to be detected and the initial mark position of the vector to be detected;
    通过所述第二全连接多头网络对所述待检测文本和所述待检测向量进行结束预测处理,得到待检测文本的结束预测位置和待检测向量的结束标记位置;Perform end prediction processing on the text to be detected and the vector to be detected through the second fully connected multi-head network to obtain an end prediction position of the text to be detected and an end mark position of the vector to be detected;
    根据所述起始预测位置、所述起始标记位置、所述结束预测位置和结束标记位置得到所述待检测文本对应的置信度。The confidence corresponding to the text to be detected is obtained according to the start prediction position, the start mark position, the end prediction position and the end mark position.
  8. 一种问题答案的预测装置,其中,包括:A device for predicting answers to questions, comprising:
    获取模块,用于获取待预测的原始题目数据;所述原始题目数据包括原始文章数据和待回答的原始问题数据;The acquisition module is used to acquire the original topic data to be predicted; the original topic data includes original article data and original question data to be answered;
    编码模块,用于根据预设的第一预训练模型对所述原始文章数据和所述原始问题数据进行编码处理,得到问题编码向量和文章编码向量;An encoding module, configured to encode the original article data and the original question data according to a preset first pre-training model to obtain a question encoding vector and an article encoding vector;
    注意力筛选模块,用于对所述问题编码向量和所述文章编码向量进行注意力筛选处理,得到多个候选文本;An attention screening module, configured to perform attention screening processing on the question encoding vector and the article encoding vector to obtain a plurality of candidate texts;
    关联模块,用于对所述原始问题数据和每一所述候选文本进行关联处理,得到关联数据;其中所述关联数据包括问题标记向量、对应每一所述候选文本的候选标记向量、关联值;其中,所述关联值用于表征所述原始问题数据和每一所述候选文本之间的关联性;An association module, configured to associate the original question data with each of the candidate texts to obtain associated data; wherein the associated data includes question mark vectors, candidate mark vectors corresponding to each of the candidate texts, and association values ; Wherein, the association value is used to characterize the association between the original question data and each of the candidate texts;
    答案筛选模块,用于对所述问题标记向量、所述多个候选文本和每一所述候选标记向量进行答案筛选处理,得到每一所述候选文本对应的置信度;其中,所述置信度用于表征所述候选文本包含候选答案的概率;An answer screening module, configured to perform answer screening processing on the question mark vector, the plurality of candidate texts, and each of the candidate mark vectors to obtain a confidence degree corresponding to each of the candidate texts; wherein, the confidence degree Used to characterize the probability that the candidate text contains a candidate answer;
    处理模块,用于根据所述关联值、所述置信度和预设的预测阈值确定候选位置;其中所述候选位置为所述候选答案所处位置;A processing module, configured to determine a candidate position according to the correlation value, the confidence level and a preset prediction threshold; wherein the candidate position is the position of the candidate answer;
    匹配模块,用于根据所述候选位置匹配对应的候选文本,得到所述候选答案。A matching module, configured to match corresponding candidate texts according to the candidate positions to obtain the candidate answers.
  9. 一种电子设备,其中,包括:An electronic device, comprising:
    至少一个存储器;at least one memory;
    至少一个处理器;at least one processor;
    至少一个程序;at least one program;
    所述程序被存储在所述存储器中,处理器执行所述至少一个程序以实现一种问题答案的预测方法;其中,所述问题答案的预测方法包括:The program is stored in the memory, and the processor executes the at least one program to implement a method for predicting an answer to a question; wherein the method for predicting an answer to a question includes:
    获取待预测的原始题目数据;其中,所述原始题目数据包括原始文章数据和待回答的原始问题数据;Obtaining the original topic data to be predicted; wherein, the original topic data includes original article data and original question data to be answered;
    根据预设的第一预训练模型对所述原始文章数据和所述原始问题数据进行编码处理,得到问题编码向量和文章编码向量;performing encoding processing on the original article data and the original question data according to a preset first pre-training model to obtain a question encoding vector and an article encoding vector;
    对所述问题编码向量和所述文章编码向量进行注意力筛选处理,得到多个候选文本;performing attention screening on the question code vector and the article code vector to obtain a plurality of candidate texts;
    对所述原始问题数据和每一所述候选文本进行关联处理,得到关联数据;其中所述关联数据包括问题标记向量、对应每一所述候选文本的候选标记向量、关联值;其中,所述关联值用于表征所述原始问题数据和每一所述候选文本之间的关联性;performing associative processing on the original question data and each of the candidate texts to obtain associated data; wherein the associated data includes question mark vectors, candidate mark vectors corresponding to each of the candidate texts, and associated values; wherein, the The association value is used to characterize the association between the original question data and each of the candidate texts;
    对所述问题标记向量、所述多个候选文本和每一所述候选标记向量进行答案筛选处理,得到每一所述候选文本对应的置信度;其中,所述置信度用于表征所述候选文本包含候选答案的概率;performing answer screening processing on the question mark vector, the plurality of candidate texts, and each of the candidate mark vectors to obtain a confidence degree corresponding to each of the candidate texts; wherein the confidence degree is used to characterize the candidate the probability that the text contains the candidate answer;
    根据所述关联值、所述置信度和预设的预测阈值确定候选位置;其中所述候选位置为所述候选答案所处位置;Determine a candidate position according to the correlation value, the confidence level and a preset prediction threshold; wherein the candidate position is the position of the candidate answer;
    根据所述候选位置匹配对应的候选文本,得到所述候选答案。Matching corresponding candidate texts according to the candidate positions to obtain the candidate answers.
  10. 根据权利要求9所述的电子设备,其中,所述原始文章数据包括多个原始文本,所述原始文本包括多个文本单词,所述原始问题数据包括多个问题单词;The electronic device according to claim 9, wherein the original article data includes a plurality of original texts, the original text includes a plurality of text words, and the original question data includes a plurality of question words;
    所述对所述问题编码向量和所述文章编码向量进行注意力筛选处理,得到多个候选文本,包括:The attention screening process is performed on the question encoding vector and the article encoding vector to obtain a plurality of candidate texts, including:
    根据预设的第一注意力模型对所述问题编码向量和文章编码向量进行注意力运算,得到注意力矩阵;其中,所述注意力矩阵包括多个注意力值,每一所述注意力值用于表征每一所述文本单词对所述问题单词的重要程度;According to the preset first attention model, the attention operation is performed on the question encoding vector and the article encoding vector to obtain an attention matrix; wherein, the attention matrix includes a plurality of attention values, each of the attention values Used to characterize the importance of each of the text words to the question word;
    获取预设的第一注意力阈值;Obtain a preset first attention threshold;
    根据所述第一注意力阈值和所述注意力矩阵对所述原始文本进行筛选处理,得到多个候选文本。The original text is screened according to the first attention threshold and the attention matrix to obtain multiple candidate texts.
  11. 根据权利要求10所述的电子设备,其中,所述根据所述第一注意力阈值和所述注意力矩阵对所述原始文本进行筛选处理,得到多个候选文本,包括:The electronic device according to claim 10, wherein said screening the original text according to the first attention threshold and the attention matrix to obtain a plurality of candidate texts, comprising:
    根据所述注意力矩阵计算同一所述文本单词对所述原始问题数据的注意力值,得到对应的文本注意力值;其中,所述文本注意力值用于表征所述文本单词对所述原始问题数据的重要程度;According to the attention matrix, the attention value of the same text word to the original question data is calculated to obtain the corresponding text attention value; wherein, the text attention value is used to characterize the text word to the original the importance of the problem data;
    若所述文本注意力值大于所述第一注意力阈值,则获取所述文本单词对应的原始文本,以得到对应的候选文本。If the text attention value is greater than the first attention threshold, the original text corresponding to the text word is obtained to obtain the corresponding candidate text.
  12. 根据权利要求9所述的电子设备,其中,所述对所述原始问题数据和每一所述候选文本进行关联处理,得到关联数据,包括:The electronic device according to claim 9, wherein said associating said original question data with each of said candidate texts to obtain associated data comprises:
    将所述原始问题数据和每一所述候选文本输入到预设的第二预训练模型中;其中,所述第二预训练模型包括第一神经网络和第二神经网络;Inputting the original question data and each of the candidate texts into a preset second pre-training model; wherein, the second pre-training model includes a first neural network and a second neural network;
    通过所述第一神经网络对所述原始问题数据和每一所述候选文本进行分类标记处理,得到问题标记向量和每一所述候选文本的候选标记向量;Classifying and labeling the original question data and each of the candidate texts through the first neural network to obtain a question label vector and a candidate label vector for each of the candidate texts;
    通过所述第二神经网络对所述问题标记向量和每一所述候选标记向量进行映射分类处理,得到对应的关联值。Mapping and classifying the question mark vector and each of the candidate mark vectors through the second neural network to obtain corresponding correlation values.
  13. 根据权利要求12所述的电子设备,其中,所述第二神经网络包括全连接层和激活分类层;The electronic device of claim 12, wherein the second neural network includes a fully connected layer and an activation classification layer;
    所述通过所述第二神经网络对所述问题标记向量和每一所述候选标记向量进行映射分类处理,得到对应的关联值,包括:The second neural network performs mapping and classification processing on the question mark vector and each of the candidate mark vectors to obtain corresponding correlation values, including:
    通过所述全连接层对所述问题标记向量和每一所述候选标记向量进行全连接处理,得到对应的全连接值;Performing a fully connected process on the question mark vector and each of the candidate mark vectors through the fully connected layer to obtain a corresponding fully connected value;
    通过所述激活分类层对所述全连接值进行激活分类处理,得到对应的关联值。The activation classification process is performed on the fully-connected values through the activation classification layer to obtain corresponding associated values.
  14. 根据权利要求9至13任意一项所述的电子设备,其中,所述对所述问题标记向量、所述多个候选文本和每一所述候选标记向量进行答案筛选处理,得到每一所述候选文本对应的置信度,包括:The electronic device according to any one of claims 9 to 13, wherein the answer screening process is performed on the question mark vector, the plurality of candidate texts and each of the candidate mark vectors to obtain each of the The confidence level corresponding to the candidate text, including:
    通过预设的第二注意力模型对所述问题标记向量和每一所述候选标记向量进行注意力筛选处理,得到多个待检测文本;performing an attention screening process on the question mark vector and each of the candidate mark vectors through a preset second attention model to obtain a plurality of texts to be detected;
    根据所述待检测文本和所述候选标记向量得到与所述待检测文本相对应的待检测向量;Obtaining a vector to be detected corresponding to the text to be detected according to the text to be detected and the candidate tag vector;
    通过预设的答案预测模型对所述待检测文本和所述待检测向量进行筛选预测处理,得到所述待检测文本对应的置信度。The text to be detected and the vector to be detected are screened and predicted by a preset answer prediction model to obtain a confidence degree corresponding to the text to be detected.
  15. 一种存储介质,所述存储介质为计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使计算机执行一种问题答案的预测方法;其中,所述问题答案的预测方法包括:A storage medium, the storage medium is a computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to make a computer execute a method for predicting answers to questions ; Wherein, the method for predicting the answer to the question includes:
    获取待预测的原始题目数据;其中,所述原始题目数据包括原始文章数据和待回答的原始问题数据;Obtain the original topic data to be predicted; wherein, the original topic data includes original article data and original question data to be answered;
    根据预设的第一预训练模型对所述原始文章数据和所述原始问题数据进行编码处理,得到问题编码向量和文章编码向量;performing encoding processing on the original article data and the original question data according to the preset first pre-training model to obtain a question encoding vector and an article encoding vector;
    对所述问题编码向量和所述文章编码向量进行注意力筛选处理,得到多个候选文本;performing attention screening on the question code vector and the article code vector to obtain a plurality of candidate texts;
    对所述原始问题数据和每一所述候选文本进行关联处理,得到关联数据;其中所述关联数据包括问题标记向量、对应每一所述候选文本的候选标记向量、关联值;其中,所述关联值用于表征所述原始问题数据和每一所述候选文本之间的关联性;performing associative processing on the original question data and each of the candidate texts to obtain associated data; wherein the associated data includes question mark vectors, candidate mark vectors corresponding to each of the candidate texts, and associated values; wherein, the The association value is used to characterize the association between the original question data and each of the candidate texts;
    对所述问题标记向量、所述多个候选文本和每一所述候选标记向量进行答案筛选处理,得到每一所述候选文本对应的置信度;其中,所述置信度用于表征所述候选文本包含候选答案的概率;performing an answer screening process on the question mark vector, the plurality of candidate texts, and each of the candidate mark vectors to obtain a confidence degree corresponding to each of the candidate texts; wherein the confidence degree is used to characterize the candidate the probability that the text contains the candidate answer;
    根据所述关联值、所述置信度和预设的预测阈值确定候选位置;其中所述候选位置为所述候选答案所处位置;Determine a candidate position according to the correlation value, the confidence level and a preset prediction threshold; wherein the candidate position is the position of the candidate answer;
    根据所述候选位置匹配对应的候选文本,得到所述候选答案。Matching corresponding candidate texts according to the candidate positions to obtain the candidate answers.
  16. 根据权利要求15所述的存储介质,其中,所述原始文章数据包括多个原始文本,所述原始文本包括多个文本单词,所述原始问题数据包括多个问题单词;The storage medium according to claim 15, wherein the original article data includes a plurality of original texts, the original text includes a plurality of text words, and the original question data includes a plurality of question words;
    所述对所述问题编码向量和所述文章编码向量进行注意力筛选处理,得到多个候选文本,包括:The attention screening process is performed on the question encoding vector and the article encoding vector to obtain a plurality of candidate texts, including:
    根据预设的第一注意力模型对所述问题编码向量和文章编码向量进行注意力运算,得到注意力矩阵;其中,所述注意力矩阵包括多个注意力值,每一所述注意力值用于表征每一所述文本单词对所述问题单词的重要程度;According to the preset first attention model, the attention operation is performed on the question encoding vector and the article encoding vector to obtain an attention matrix; wherein, the attention matrix includes a plurality of attention values, each of the attention values Used to characterize the importance of each of the text words to the question word;
    获取预设的第一注意力阈值;Obtain a preset first attention threshold;
    根据所述第一注意力阈值和所述注意力矩阵对所述原始文本进行筛选处理,得到多个候选文本。The original text is screened according to the first attention threshold and the attention matrix to obtain multiple candidate texts.
  17. 根据权利要求16所述的存储介质,其中,所述根据所述第一注意力阈值和所述注意力矩阵对所述原始文本进行筛选处理,得到多个候选文本,包括:The storage medium according to claim 16, wherein said filtering the original text according to the first attention threshold and the attention matrix to obtain a plurality of candidate texts, comprising:
    根据所述注意力矩阵计算同一所述文本单词对所述原始问题数据的注意力值,得到对应的文本注意力值;其中,所述文本注意力值用于表征所述文本单词对所述原始问题数据的重要程度;According to the attention matrix, the attention value of the same text word to the original question data is calculated to obtain the corresponding text attention value; wherein, the text attention value is used to characterize the text word to the original the importance of the problem data;
    若所述文本注意力值大于所述第一注意力阈值,则获取所述文本单词对应的原始文本,以得到对应的候选文本。If the text attention value is greater than the first attention threshold, the original text corresponding to the text word is obtained to obtain the corresponding candidate text.
  18. 根据权利要求15所述的存储介质,其中,所述对所述原始问题数据和每一所述候选文本进行关联处理,得到关联数据,包括:The storage medium according to claim 15, wherein said associating said original question data with each of said candidate texts to obtain associated data comprises:
    将所述原始问题数据和每一所述候选文本输入到预设的第二预训练模型中;其中,所述第二预训练模型包括第一神经网络和第二神经网络;Inputting the original question data and each of the candidate texts into a preset second pre-training model; wherein, the second pre-training model includes a first neural network and a second neural network;
    通过所述第一神经网络对所述原始问题数据和每一所述候选文本进行分类标记处理,得到问题标记向量和每一所述候选文本的候选标记向量;Classifying and labeling the original question data and each of the candidate texts through the first neural network to obtain a question label vector and a candidate label vector for each of the candidate texts;
    通过所述第二神经网络对所述问题标记向量和每一所述候选标记向量进行映射分类处理,得到对应的关联值。Mapping and classifying the question mark vector and each of the candidate mark vectors through the second neural network to obtain corresponding correlation values.
  19. 根据权利要求18所述的存储介质,其中,所述第二神经网络包括全连接层和激活分类层;The storage medium of claim 18, wherein the second neural network includes a fully connected layer and an activation classification layer;
    所述通过所述第二神经网络对所述问题标记向量和每一所述候选标记向量进行映射分类处理,得到对应的关联值,包括:The second neural network performs mapping and classification processing on the question mark vector and each of the candidate mark vectors to obtain corresponding correlation values, including:
    通过所述全连接层对所述问题标记向量和每一所述候选标记向量进行全连接处理,得到对应的全连接值;Performing a fully connected process on the question mark vector and each of the candidate mark vectors through the fully connected layer to obtain a corresponding fully connected value;
    通过所述激活分类层对所述全连接值进行激活分类处理,得到对应的关联值。The activation classification process is performed on the fully-connected values through the activation classification layer to obtain corresponding associated values.
  20. 根据权利要求15至19任意一项所述的存储介质,其中,所述对所述问题标记向量、所述多个候选文本和每一所述候选标记向量进行答案筛选处理,得到每一所述候选文本对应的置信度,包括:The storage medium according to any one of claims 15 to 19, wherein the answer screening process is performed on the question mark vector, the plurality of candidate texts and each of the candidate mark vectors to obtain each of the The confidence level corresponding to the candidate text, including:
    通过预设的第二注意力模型对所述问题标记向量和每一所述候选标记向量进行注意力筛选处理,得到多个待检测文本;performing an attention screening process on the question mark vector and each of the candidate mark vectors through a preset second attention model to obtain a plurality of texts to be detected;
    根据所述待检测文本和所述候选标记向量得到与所述待检测文本相对应的待检测向量;Obtaining a vector to be detected corresponding to the text to be detected according to the text to be detected and the candidate tag vector;
    通过预设的答案预测模型对所述待检测文本和所述待检测向量进行筛选预测处理,得到所述待检测文本对应的置信度。The text to be detected and the vector to be detected are screened and predicted by a preset answer prediction model to obtain a confidence degree corresponding to the text to be detected.
PCT/CN2022/090750 2022-01-11 2022-04-29 Question answer prediction method and prediction apparatus, electronic device, and storage medium WO2023134085A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210025867.7A CN114416962A (en) 2022-01-11 2022-01-11 Question answer prediction method, prediction device, electronic device, and storage medium
CN202210025867.7 2022-01-11

Publications (1)

Publication Number Publication Date
WO2023134085A1 true WO2023134085A1 (en) 2023-07-20

Family

ID=81272360

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090750 WO2023134085A1 (en) 2022-01-11 2022-04-29 Question answer prediction method and prediction apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN114416962A (en)
WO (1) WO2023134085A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117540730A (en) * 2023-10-10 2024-02-09 鹏城实验室 Text labeling method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491433A (en) * 2018-02-09 2018-09-04 平安科技(深圳)有限公司 Chat answer method, electronic device and storage medium
US20180300314A1 (en) * 2017-04-12 2018-10-18 Petuum Inc. Constituent Centric Architecture for Reading Comprehension
CN110647629A (en) * 2019-09-20 2020-01-03 北京理工大学 Multi-document machine reading understanding method for multi-granularity answer sorting
CN113486203A (en) * 2021-07-09 2021-10-08 平安科技(深圳)有限公司 Data processing method and device based on question-answering platform and related equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180300314A1 (en) * 2017-04-12 2018-10-18 Petuum Inc. Constituent Centric Architecture for Reading Comprehension
CN108491433A (en) * 2018-02-09 2018-09-04 平安科技(深圳)有限公司 Chat answer method, electronic device and storage medium
CN110647629A (en) * 2019-09-20 2020-01-03 北京理工大学 Multi-document machine reading understanding method for multi-granularity answer sorting
CN113486203A (en) * 2021-07-09 2021-10-08 平安科技(深圳)有限公司 Data processing method and device based on question-answering platform and related equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117540730A (en) * 2023-10-10 2024-02-09 鹏城实验室 Text labeling method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN114416962A (en) 2022-04-29

Similar Documents

Publication Publication Date Title
CN110795543B (en) Unstructured data extraction method, device and storage medium based on deep learning
CN110737801B (en) Content classification method, apparatus, computer device, and storage medium
CN111753060A (en) Information retrieval method, device, equipment and computer readable storage medium
CN112131883B (en) Language model training method, device, computer equipment and storage medium
CN113255320A (en) Entity relation extraction method and device based on syntax tree and graph attention machine mechanism
CN114358007A (en) Multi-label identification method and device, electronic equipment and storage medium
CN113705315B (en) Video processing method, device, equipment and storage medium
CN111881292B (en) Text classification method and device
CN113128431B (en) Video clip retrieval method, device, medium and electronic equipment
CN113239169A (en) Artificial intelligence-based answer generation method, device, equipment and storage medium
CN114416995A (en) Information recommendation method, device and equipment
CN112632244A (en) Man-machine conversation optimization method and device, computer equipment and storage medium
CN114897060B (en) Training method and device for sample classification model, and sample classification method and device
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN114691864A (en) Text classification model training method and device and text classification method and device
CN117493491A (en) Natural language processing method and system based on machine learning
CN115827819A (en) Intelligent question and answer processing method and device, electronic equipment and storage medium
CN113392265A (en) Multimedia processing method, device and equipment
CN114492661B (en) Text data classification method and device, computer equipment and storage medium
CN113449081A (en) Text feature extraction method and device, computer equipment and storage medium
CN114091452A (en) Adapter-based transfer learning method, device, equipment and storage medium
CN117217277A (en) Pre-training method, device, equipment, storage medium and product of language model
WO2023134085A1 (en) Question answer prediction method and prediction apparatus, electronic device, and storage medium
CN114490949A (en) Document retrieval method, device, equipment and medium based on BM25 algorithm
CN111445545B (en) Text transfer mapping method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22919725

Country of ref document: EP

Kind code of ref document: A1