CN115905459A

CN115905459A - Question answer prediction method, device and storage medium

Info

Publication number: CN115905459A
Application number: CN202210217195.XA
Authority: CN
Inventors: 刘光辉; 赵国庆; 权佳成
Original assignee: Beijing Finite Element Technology Co Ltd
Current assignee: Beijing Finite Element Technology Co Ltd
Priority date: 2022-03-07
Filing date: 2022-03-07
Publication date: 2023-04-04

Abstract

The application discloses a method, a device and a storage medium for predicting question answers, wherein the method comprises the following steps: acquiring a question and a paragraph associated with the question; inputting the questions and paragraphs into a pre-trained deep learning model, and outputting text vectors; inputting the text vector into a first convolution neural network, and predicting the head position of the answer of the question in a paragraph; inputting the text vector into a second convolutional neural network, and predicting the tail position of the answer in the paragraph; based on the predicted leading and trailing positions, an answer to the question is determined from the passage.

Description

Question answer prediction method, device and storage medium

Technical Field

The present application relates to the field of information processing technologies, and in particular, to a method and an apparatus for predicting answers to questions, and a storage medium.

Background

Machine Reading Comprehension (MRC) is a task for testing the extent to which a machine understands natural language by asking the machine to answer questions according to a given context. Early MRC systems were rule-based and very poor in performance. With the advent of deep learning and large-scale data sets, deep learning-based MRC is significantly superior to rule-based MRC. Common MRC tasks can be divided into four types: filling in blank, selecting multiple items, extracting segments and freely answering. The general MRC architecture consists of several modules: embegding, feature Extraction, context-query Interaction, answer prediction. In addition, in view of the limitations of the current methods, new tasks have emerged from MRCs, such as knowledge-based MRCs, MRCs with unanswered queries, multi-pass MRCs, and comprehensive query answering.

Currently, the BiDAF model is mostly used to predict answers to questions. However, the use of the BiDAF model is very limited. The BiDAF model only works for a small segment, and if a larger segment or many small segments are given, the BiDAF model usually takes a long time and returns with a possible span as an answer, with a lower accuracy.

In addition, there is also The use of The standard Attentive Reader model to predict answers to questions. However, the answer output by The standard for digital Reader model is to directly intercept The characters in The paragraph, and there are no problems of non-judgment, counting and The like. The standard Attentive Reader model selection problem depends on paragraphs, and may be different from The actual information acquisition requirements. The standardattentive Reader model causes information loss problem if The length of The problem is long.

In view of the above technical problems in the prior art that the prediction of the problem answers results in information loss and low accuracy, no effective solution has been proposed at present.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, and a storage medium for predicting answers to questions, so as to solve at least the technical problems of information loss and low accuracy caused by predicting answers to questions in the prior art.

According to an aspect of an embodiment of the present invention, there is provided a method for predicting answers to questions, including: acquiring a question and a paragraph associated with the question; inputting the questions and paragraphs into a pre-trained deep learning model, and outputting text vectors; inputting the text vector into a first convolution neural network, and predicting the head position of the answer of the question in a paragraph; inputting the text vector into a second convolutional neural network, and predicting the tail position of the answer in the paragraph; based on the predicted head and tail positions, an answer to the question is determined from the passage.

Optionally, inputting the question and the paragraph into a pre-trained deep learning model, and outputting a text vector, including: generating an input sequence corresponding to the question and the paragraph based on the special classification embedding and the special separator, wherein the input sequence is the sum of the mark embedding, the segmentation embedding and the position embedding; and inputting the input sequence into a pre-trained deep learning model, outputting a coding vector corresponding to each mark in the paragraph, and marking all the coding vectors of the paragraph as text vectors.

Optionally, inputting the question and the paragraph into a pre-trained deep learning model, including: inputting the questions and paragraphs into a pre-trained Bert model; inputting the questions and paragraphs into a pre-trained ALBERT model; or enter questions and paragraphs into a pre-trained BilSt + Attention model.

Optionally, inputting the text vector into a first convolutional neural network, predicting the first position of the answer of the question in the paragraph, including: inputting the text vector into a first convolutional neural network consisting of a first CNN network and a first Dense network, and predicting the first position of the answer of the question in the paragraph by combining a semi-pointer-semi-label structure.

Optionally, inputting the text vector into a second convolutional neural network, predicting a tail position of the answer in the paragraph, including: and inputting the text vector into a second convolutional neural network consisting of a second CNN network and a second Dense network, and predicting the tail position of the answer in the paragraph by combining a semi-pointer-semi-label structure.

Optionally, the network structure of the first convolutional neural network and the second convolutional neural network is the same.

According to another aspect of the embodiments of the present invention, there is also provided a storage medium including a stored program, wherein the method of any one of the above is performed by a processor when the program is run.

According to another aspect of the embodiments of the present invention, there is provided an apparatus for predicting answers to questions, including: the acquisition module is used for acquiring the question and a paragraph associated with the question; the coding module is used for inputting the questions and paragraphs into a pre-trained deep learning model and outputting text vectors; the first prediction module is used for inputting the text vector into a first convolutional neural network and predicting the head position of the answer of the question in the paragraph; the second prediction module is used for inputting the text vector into a second convolutional neural network and predicting the tail position of the answer in the paragraph; and the answer determining module is used for determining answers of the questions from the paragraphs based on the predicted head positions and the predicted tail positions.

Optionally, the encoding module is specifically configured to: generating an input sequence corresponding to the question and the paragraph based on the special classification embedding and the special separator, wherein the input sequence is the sum of the mark embedding, the segmentation embedding and the position embedding; and inputting the input sequence into a pre-trained deep learning model, outputting a coding vector corresponding to each mark in the paragraph, and marking all the coding vectors of the paragraph as text vectors.

According to another aspect of the embodiments of the present invention, there is also provided an apparatus for predicting answers to questions, including: a processor; and a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring a question and a paragraph associated with the question; inputting the questions and paragraphs into a pre-trained deep learning model, and outputting text vectors; inputting the text vector into a first convolutional neural network, and predicting the head position of the answer of the question in the paragraph; inputting the text vector into a second convolutional neural network, and predicting the tail position of the answer in the paragraph; based on the predicted leading and trailing positions, an answer to the question is determined from the passage.

In the embodiment of the invention, a question and a paragraph associated with the question are firstly obtained, then the question and the paragraph are input into a pre-trained deep learning model, a text vector is output, then the text vector is input into a first convolutional neural network, the text vector is input into a second convolutional neural network, the head position and the tail position of an answer in the paragraph are respectively predicted, and finally the answer of the question is determined from the paragraph based on the predicted head position and the predicted tail position. According to the invention, the problem and the paragraph are coded by using the pre-trained deep learning model, the text vector fusing the problem context logic semantic information and the paragraph context logic semantic information can be output, and the problem of information loss caused by the length of the problem is effectively solved. The head position and the tail position of the answer in the paragraph are respectively predicted through the two convolutional neural networks, so that the answer of the question can be accurately determined from the paragraph according to the predicted head position and tail position. The problem of the prediction of the problem answer can lead to information loss and the technical problem that the rate of accuracy is low among the prior art is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of a hardware configuration of a computing apparatus for implementing the method according to embodiment 1 of the present invention;

fig. 2 is a flowchart illustrating a method for predicting answers to questions according to the first aspect of embodiment 1 of the present invention;

fig. 3 is a schematic diagram of a method for predicting answers to questions according to embodiment 1 of the present invention;

fig. 4 is a schematic diagram of an apparatus for predicting answers to questions according to embodiment 2 of the present invention; and

fig. 5 is a schematic diagram of an apparatus for predicting answers to questions according to embodiment 3 of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention. It is to be understood that the described embodiments are merely exemplary of a portion of the invention and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some terms or terms appearing in the description of the embodiments of the present invention are applicable to the following explanations:

MRC: machine reading comprehension;

LSTM: long and short term memory networks, a special RNN;

CNN: a convolutional neural network;

glove: it is a word representation tool based on global word frequency statistics, which can represent a word as a vector consisting of real numbers, and these vectors capture some semantic characteristics between words, such as similarity (similarity), analogy (analogy), etc.

Example 1

According to the present embodiment, there is provided an embodiment of a method for predicting answers to questions, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

The method embodiment provided by the present embodiment may be executed in a server or similar computing device. Fig. 1 illustrates a hardware configuration block diagram of a computing device for implementing a prediction method of answers to questions. As shown in fig. 1, the computing device may include one or more processors (which may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory for storing data, and a transmission device for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computing device may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single, stand-alone processing module, or incorporated, in whole or in part, into any of the other elements in the computing device. As referred to in the embodiments of the invention, the data processing circuit acts as a processor control (e.g. selection of variable resistance termination paths connected to the interface).

The memory may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the method for predicting answers to questions in the embodiments of the present invention, and the processor executes various functional applications and data processing by operating the software programs and modules stored in the memory, that is, implements the method for predicting answers to questions of the application software. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory may further include memory remotely located from the processor, which may be connected to the computing device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device is used for receiving or transmitting data via a network. Specific examples of such networks may include wireless networks provided by communication providers of the computing devices. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

The display may be, for example, a touch screen-type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computing device.

It should be noted that in some alternative embodiments, the computing device illustrated in fig. 1 may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that FIG. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in a computing device as described above.

In the above operating environment, according to the first aspect of the present embodiment, a method for predicting answers to questions is provided. Fig. 2 shows a flow diagram of the method, which, with reference to fig. 2, comprises:

s202: acquiring a question and a paragraph associated with the question;

in the embodiment of the present invention, when predicting answers to questions, a Question (Question) and a paragraph (Passage) associated with the Question are acquired, and then an answer to the Question (Question) is found from the paragraph (Passage) based on the Question (Question) and the paragraph (Passage).

S204: inputting the questions and paragraphs into a pre-trained deep learning model, and outputting text vectors;

optionally, inputting the question and the paragraph into a pre-trained deep learning model, and outputting a text vector, including: generating an input sequence corresponding to the question and the paragraph based on the special classification embedding and the special separator, wherein the input sequence is the sum of the mark embedding, the segmentation embedding and the position embedding; and inputting the input sequence into a pre-trained deep learning model, outputting a coding vector corresponding to each mark in the paragraph, and recording all the coding vectors of the paragraph as text vectors.

Optionally, inputting the questions and paragraphs into a pre-trained deep learning model, comprising: inputting the questions and paragraphs into a pre-trained Bert model; inputting the questions and paragraphs into a pre-trained ALBERT model; or enter questions and paragraphs into a pre-trained BilSt + Attention model.

In the embodiment of the invention, the pre-trained deep learning model can be a Bert model, an ALBERT model or a BilSTM + Attention model. Hereinafter, taking the Bert model as an example, the processing modes of the ALBERT model and the BiLSTM + Attention model are the same as those of the Bert model, and are not described herein again.

With reference to fig. 3, when the deep learning model Bert model is pre-trained, the Question (Question) and the paragraph (Passage) are respectively input as text1 and text2 of the Bert model, and are separated by a special separator "[ SEP ]", and like other downstream tasks, the first token of the input sequence is the special class embedding "[ CLS ]", and the input sequence is the sum of token embedding, segmentation embedding, and position embedding.

It is generally assumed that the Answer (Answer) is contained in a paragraph, so the goal of a machine-read understanding task is to get a range, i.e., span (start, end). Where start denotes the position of the start character of the Answer (Answer) in the paragraph (Passage), and end denotes the position of the end character of the Answer (Answer) in the paragraph (Passage). The output of the Bert model is an encoding vector (encoding vector) corresponding to each token of the paragraph (Passage), and all encoding vectors of the paragraph (Passage) are taken as a text vector H.

In the embodiment of the invention, the pre-trained deep learning model is used for coding the problem and the paragraph, the text vector fusing the problem context logic semantic information and the paragraph context logic semantic information can be output, and the problem of information loss caused by the length of the problem is effectively solved. In addition, the invention adds the Bert layer, improves the representation capability of the model, designs a new network structure to realize machine reading understanding, and achieves better result.

S206: inputting the text vector into a first convolution neural network, and predicting the head position of the answer of the question in a paragraph;

optionally, inputting the text vector into a first convolutional neural network, predicting the first position of the answer of the question in the paragraph, including: inputting the text vector into a first convolutional neural network consisting of a first CNN network and a first Dense network, and predicting the first position of the answer of the question in the paragraph by combining a semi-pointer-semi-label structure. S208: inputting the text vector into a second convolutional neural network, and predicting the tail position of the answer in the paragraph;

optionally, inputting the text vector into a second convolutional neural network, predicting a tail position of the answer in the paragraph, including: and inputting the text vector into a second convolutional neural network consisting of a second CNN network and a second Dense network, and predicting the tail position of the answer in the paragraph by combining a half pointer-half label structure.

Optionally, the network structure of the first convolutional neural network and the network structure of the second convolutional neural network are the same.

In the embodiment of the present invention, as shown in fig. 3, two convolutional neural networks are trained in advance, and the two convolutional neural networks have the same or substantially the same network structure and are composed of a CNN network and a density network. When answer prediction is carried out, the text vector H output by the Bert model can be respectively transmitted into the first convolutional neural network and the second convolutional neural network. At this time, the first convolutional neural network predicts the head position of the answer with a "half pointer-half label" structure, and the second convolutional neural network also predicts the tail position of the answer with a "half pointer-half label" structure. By the method, the head position and the tail position of the answer can be accurately predicted, and multi-answer return can be realized.

S210: based on the predicted leading and trailing positions, an answer to the question is determined from the passage.

Therefore, according to the method for predicting the answers to the questions, the questions and the paragraphs associated with the questions are obtained firstly, then the questions and the paragraphs are input into a pre-trained deep learning model, a text vector is output, the text vector is input into a first convolutional neural network, the text vector is input into a second convolutional neural network, the head positions and the tail positions of the answers in the paragraphs are predicted respectively, and finally the answers to the questions are determined from the paragraphs based on the predicted head positions and the predicted tail positions. The invention utilizes the pre-trained deep learning model to encode the problems and the paragraphs, can output the text vector fusing the problem context logic semantic information and the paragraph context logic semantic information, and effectively solves the problem of information loss caused by the length of the problems. The head position and the tail position of the answer in the paragraph are respectively predicted through the two convolutional neural networks, so that the answer of the question can be accurately determined from the paragraph according to the predicted head position and tail position. The problem of the prediction of the problem answer can lead to information loss and the technical problem that the rate of accuracy is low among the prior art is solved.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art will appreciate that the embodiments described in this specification are presently preferred and that no acts or modules are required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

Fig. 4 shows an apparatus 400 for predicting answers to questions according to the present embodiment, the apparatus 400 corresponding to the method according to the first aspect of embodiment 1. Referring to fig. 4, the apparatus 400 includes: an obtaining module 410, configured to obtain a question and a paragraph associated with the question; the encoding module 420 is configured to input the questions and paragraphs into a pre-trained deep learning model, and output text vectors; the first prediction module 430 is configured to input the text vector into a first convolutional neural network, and predict a first position of an answer to the question in the paragraph; a second prediction module 440, configured to input the text vector into a second convolutional neural network, and predict a tail position of the answer in the paragraph; an answer determination module 450 for determining answers to the questions from the paragraphs based on the predicted head and tail positions.

Optionally, the encoding module 420 is specifically configured to: generating an input sequence corresponding to the question and the paragraph based on the special classification embedding and the special separator, wherein the input sequence is the sum of the mark embedding, the segmentation embedding and the position embedding; and inputting the input sequence into a pre-trained deep learning model, outputting a coding vector corresponding to each mark in the paragraph, and recording all the coding vectors of the paragraph as text vectors.

Optionally, the encoding module 420 is further specifically configured to: inputting the questions and paragraphs into a pre-trained Bert model;

inputting the questions and paragraphs into a pre-trained ALBERT model; or enter questions and paragraphs into the pre-trained BilSTM + Attention model.

Optionally, the first prediction module 430 is specifically configured to: inputting the text vector into a first convolutional neural network consisting of a first CNN network and a first Dense network, and predicting the first position of the answer of the question in the paragraph by combining a semi-pointer-semi-label structure.

Optionally, the second prediction module 440 is specifically configured to: and inputting the text vector into a second convolutional neural network consisting of a second CNN network and a second Dense network, and predicting the tail position of the answer in the paragraph by combining a half pointer-half label structure.

Therefore, according to the embodiment, the question and the paragraph associated with the question are firstly obtained, then the question and the paragraph are input into the pre-trained deep learning model, the text vector is output, then the text vector is input into the first convolutional neural network, the text vector is input into the second convolutional neural network, the head position and the tail position of the answer in the paragraph are respectively predicted, and finally the answer of the question is determined from the paragraph based on the predicted head position and the predicted tail position. According to the invention, the problem and the paragraph are coded by using the pre-trained deep learning model, the text vector fusing the problem context logic semantic information and the paragraph context logic semantic information can be output, and the problem of information loss caused by the length of the problem is effectively solved. The head position and the tail position of the answer in the paragraph are respectively predicted through the two convolutional neural networks, so that the answer of the question can be accurately determined from the paragraph according to the predicted head position and tail position. The problem of the prediction of the problem answer that exists among the prior art can lead to information loss and the low technical problem of rate of accuracy is solved.

Example 3

Fig. 5 shows a device 500 for predicting answers to questions according to the present embodiment, the device 500 corresponding to the method according to the first aspect of embodiment 1. Referring to fig. 5, the apparatus 500 includes: a processor 510; and a memory 520 coupled to processor 510 for providing processor 510 with instructions to process the following process steps: acquiring a question and a paragraph associated with the question; inputting the questions and paragraphs into a pre-trained deep learning model, and outputting text vectors; inputting the text vector into a first convolution neural network, and predicting the head position of the answer of the question in a paragraph; inputting the text vector into a second convolutional neural network, and predicting the tail position of the answer in the paragraph; based on the predicted head and tail positions, an answer to the question is determined from the passage.

Optionally, inputting the question and the paragraph into a pre-trained deep learning model, including: inputting the questions and paragraphs into a pre-trained Bert model; inputting the questions and paragraphs into a pre-trained ALBERT model; or enter questions and paragraphs into the pre-trained BilSTM + Attention model.

Therefore, according to the embodiment, the question and the paragraph associated with the question are firstly obtained, then the question and the paragraph are input into the pre-trained deep learning model, the text vector is output, then the text vector is input into the first convolutional neural network, the text vector is input into the second convolutional neural network, the head position and the tail position of the answer in the paragraph are respectively predicted, and finally the answer of the question is determined from the paragraph based on the predicted head position and the predicted tail position. The invention utilizes the pre-trained deep learning model to encode the problems and the paragraphs, can output the text vector fusing the problem context logic semantic information and the paragraph context logic semantic information, and effectively solves the problem of information loss caused by the length of the problems. The head position and the tail position of the answer in the paragraph are respectively predicted through the two convolutional neural networks, so that the answer of the question can be accurately determined from the paragraph according to the predicted head position and tail position. The problem of the prediction of the problem answer that exists among the prior art can lead to information loss and the low technical problem of rate of accuracy is solved.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technical content can be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be an indirect coupling or communication connection through some interfaces, units or modules, and may be electrical or in other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is substantially or partly contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for predicting answers to questions, comprising:

acquiring a question and a paragraph associated with the question;

inputting the questions and paragraphs into a pre-trained deep learning model, and outputting text vectors;

inputting the text vector into a first convolution neural network, and predicting the head position of the answer of the question in a paragraph;

inputting the text vector into a second convolutional neural network, and predicting the tail position of the answer in the paragraph;

based on the predicted head and tail positions, an answer to the question is determined from the passage.

2. The method of claim 1, wherein inputting questions and paragraphs into a pre-trained deep learning model and outputting a text vector comprises:

generating an input sequence corresponding to the question and the paragraph based on the special classification embedding and the special separator, wherein the input sequence is the sum of the mark embedding, the segmentation embedding and the position embedding;

and inputting the input sequence into a pre-trained deep learning model, outputting a coding vector corresponding to each mark in the paragraph, and recording all the coding vectors of the paragraph as text vectors.

3. The method of claim 1 or 2, wherein inputting questions and paragraphs into a pre-trained deep learning model comprises:

inputting the questions and paragraphs into a pre-trained Bert model;

inputting the questions and paragraphs into a pre-trained ALBERT model; or

Questions and paragraphs are input into the pre-trained BilSTM + Attention model.

4. The method of claim 1, wherein inputting a text vector into a first convolutional neural network, predicting the first position of the answer to the question in the passage, comprises: and inputting the text vector into a first convolutional neural network consisting of a first CNN network and a first Dense network, and predicting the first position of the answer of the question in the paragraph by combining a half pointer-half label structure.

5. The method of claim 1, wherein inputting the text vector into a second convolutional neural network, predicting the end position of the answer in the paragraph, comprises: and inputting the text vector into a second convolutional neural network consisting of a second CNN network and a second Dense network, and predicting the tail position of the answer in the paragraph by combining a half pointer-half label structure.

6. The method of claim 1, wherein the network structure of the first convolutional neural network and the second convolutional neural network is the same.

7. A storage medium comprising a stored program, wherein the method of any one of claims 1 to 6 is performed by a processor when the program is run.

8. An apparatus for predicting answers to questions, comprising:

the acquisition module is used for acquiring the question and a paragraph associated with the question;

the coding module is used for inputting the questions and paragraphs into a pre-trained deep learning model and outputting text vectors;

the first prediction module is used for inputting the text vector into a first convolutional neural network and predicting the head position of the answer of the question in the paragraph;

the second prediction module is used for inputting the text vector into a second convolutional neural network and predicting the tail position of the answer in the paragraph;

and the answer determining module is used for determining answers of the questions from the paragraphs based on the predicted head positions and the predicted tail positions.

9. The prediction device according to claim 8, characterized in that the coding module is specifically configured to:

and inputting the input sequence into a pre-trained deep learning model, outputting a coding vector corresponding to each mark in the paragraph, and marking all the coding vectors of the paragraph as text vectors.

10. An apparatus for predicting answers to questions, comprising:

a processor; and

a memory coupled to the processor for providing instructions to the processor for processing the following processing steps:

acquiring a question and a paragraph associated with the question;

based on the predicted leading and trailing positions, an answer to the question is determined from the passage.