CN114780700A - Intelligent question-answering method, device, equipment and medium based on machine reading understanding - Google Patents

Intelligent question-answering method, device, equipment and medium based on machine reading understanding Download PDF

Info

Publication number
CN114780700A
CN114780700A CN202210415733.6A CN202210415733A CN114780700A CN 114780700 A CN114780700 A CN 114780700A CN 202210415733 A CN202210415733 A CN 202210415733A CN 114780700 A CN114780700 A CN 114780700A
Authority
CN
China
Prior art keywords
question
document
preset
screening
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210415733.6A
Other languages
Chinese (zh)
Inventor
骆加维
阮晓雯
陈远旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202210415733.6A priority Critical patent/CN114780700A/en
Publication of CN114780700A publication Critical patent/CN114780700A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an intelligent question-answering method, device, equipment and storage medium based on machine reading understanding, wherein the method comprises the following steps: acquiring a question text input by a user; searching a document associated with the problem text, and screening the associated document based on a preset three-level screening strategy to obtain a screened associated document; and inputting the question text and the screened associated document into a pre-trained machine reading model to obtain an output question answer. According to the intelligent question-answering method provided by the embodiment of the application, the document with high relevance degree to the user question is obtained based on the three-level screening strategy, then the correct and effective answer is obtained based on the machine reading understanding model, and the accuracy of the machine reading understanding result can be effectively improved.

Description

Intelligent question-answering method, device, equipment and medium based on machine reading understanding
Technical Field
The present application relates to the field of intelligent question and answer technologies, and in particular, to an intelligent question and answer method, apparatus, device, and medium based on machine reading understanding.
Background
With the development and progress of science and technology and the rapid development of intelligent equipment and networks, people can generate a large amount of data in daily life, and people enter a big data era. Among the massive data, the data stored in the form of natural language occupies a part of the massive data, and the part is also an important source for people to acquire information, and people can search the massive data for the information needed by themselves. But it often takes a lot of time and effort to find the information needed by the user in daily searching. Therefore, there is an increasing demand for intelligent question-answering systems.
At present, the intelligent question-answering system is rare, the intelligent degree is low, and the questions put forward by the user cannot be well understood and correct and effective answers cannot be returned. When a user asks questions, answers given by the system are often questions, the answers cannot enable the user to obtain useful information at all, the user cannot obtain contents which are most concerned by the user, and therefore a large amount of data loses the value of the user and is not utilized completely.
Disclosure of Invention
The embodiment of the application provides an intelligent question answering method, device, equipment and medium based on machine reading understanding. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
In a first aspect, an embodiment of the present application provides an intelligent question answering method based on machine reading understanding, and the method includes:
acquiring a question text input by a user;
searching a document associated with the problem text, and screening the associated document based on a preset three-level screening strategy to obtain a screened associated document;
and inputting the question text and the screened associated document into a pre-trained machine reading model to obtain an output question answer.
In one embodiment, after the obtaining of the question text input by the user, the method further includes:
searching answers corresponding to the question texts from a preset knowledge graph;
if the answer corresponding to the question text is found, returning the found question answer;
and if the answer corresponding to the question text cannot be found, searching the document associated with the question text.
In one embodiment, finding a document associated with the question text comprises:
performing word segmentation processing on the problem text by adopting a preset word segmentation algorithm, and extracting problem keywords;
searching based on the problem keywords and a preset first search engine to obtain a first preset number of associated documents arranged in front;
and searching based on the problem keywords and a preset second search engine to obtain a second preset number of associated documents arranged in front.
In one embodiment, screening the associated documents based on a preset three-level screening policy to obtain screened associated documents includes:
calculating a first similarity between the question text and the title of the associated document based on a preset word shift algorithm;
removing the associated documents with the first similarity lower than a preset threshold value to obtain associated documents after primary screening;
calculating a second similarity between the problem text and the associated document after the primary screening based on a preset BLEU4 algorithm;
taking the first two associated documents with higher second similarity as the associated documents after the secondary screening;
and preprocessing the associated document after the secondary screening to obtain the associated document after the tertiary screening.
In one embodiment, the preprocessing the secondary filtered associated document to obtain a tertiary filtered associated document includes:
judging whether the associated document after the secondary screening contains a problem text;
if the relevant document after the secondary screening contains the problem text, taking a document paragraph containing the problem text as the relevant document after the tertiary screening;
if the relevant document after the secondary screening does not contain the problem text, acquiring a first sentence and a last sentence of each paragraph in the relevant document after the secondary screening to obtain a relevant sentence set;
calculating a third similarity between the problem text and each sentence in the associated sentence set based on a preset BLEU4 algorithm;
and combining the sentences with the higher third similarity in the preset number to obtain combined paragraphs, and taking the combined paragraphs as the related documents after the three-level screening.
In one embodiment, the pre-trained machine-reading model is a modified Bidaf model;
the improved Bidaf model comprises the following steps: the system comprises a question and associated document input layer, a word vector pre-training layer, a vector splicing layer, an attention-based fusion layer, an answer matching layer and an answer output layer which are connected in sequence.
In one embodiment, after obtaining the output answer to the question, the method further includes:
acquiring word number information of a question answer output by a machine reading model;
checking the answer to the question according to the word number information of the answer to the question and a preset checking rule;
and if the verification is passed, returning a question answer to the user.
In a second aspect, an embodiment of the present application provides an intelligent question answering device based on machine reading understanding, and the device includes:
the acquisition module is used for acquiring a question text input by a user;
the screening module is used for searching for the document associated with the problem text and screening the associated document based on a preset three-level screening strategy to obtain a screened associated document;
and the output module is used for inputting the question text and the screened associated document into a pre-trained machine reading model to obtain an output question answer.
In a third aspect, an embodiment of the present application provides a computer device, including a memory and a processor, where the memory stores computer readable instructions, and when the computer readable instructions are executed by the processor, the processor is caused to execute the intelligent question-answering method based on machine reading understanding provided in the foregoing embodiment.
In a fourth aspect, the present application provides a storage medium storing computer readable instructions, which when executed by one or more processors, cause the one or more processors to execute the intelligent question-answering method based on machine reading understanding provided by the foregoing embodiments.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
according to the intelligent question-answering method provided by the embodiment of the application, the search range comprises Baidu search, Baidu encyclopedia, Baidu knowing, knowing and the like, and the searched documents are screened by applying various data preprocessing methods. For example, a document with a high degree of association with a user question is obtained based on a three-level screening strategy, and then the question and the associated document are led into a machine reading understanding model to obtain a correct and effective answer, so that the accuracy of a machine reading understanding result is greatly improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
FIG. 1 is a diagram of an environment for implementing a machine-reading understanding-based intelligent question-answering method in accordance with an exemplary embodiment;
FIG. 2 is an internal block diagram of a computer device shown in accordance with an exemplary embodiment;
FIG. 3 is a flow diagram illustrating a machine-reading understanding-based intelligent question and answer method in accordance with one illustrative embodiment;
FIG. 4 is a flow diagram illustrating a machine-reading understanding-based intelligent question and answer method in accordance with an exemplary embodiment;
FIG. 5 is a schematic illustration of a machine reading understanding model in accordance with an exemplary embodiment;
fig. 6 is a schematic diagram illustrating a machine-reading understanding-based intelligent question answering apparatus according to an exemplary embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first field and algorithm determination module may be referred to as a second field and algorithm determination module, and similarly, a second field and algorithm determination module may be referred to as a first field and algorithm determination module, without departing from the scope of the present application.
Fig. 1 is a diagram illustrating an implementation environment of a smart question answering method based on machine reading understanding according to an exemplary embodiment, as shown in fig. 1, in which the implementation environment includes a server 110 and a terminal 120.
The server 110 is an intelligent question answering device based on machine reading understanding, for example, a computer device such as a computer used by a technician, and the server 110 is provided with an intelligent question answering tool. The terminal 120 is installed with an application that needs to perform intelligent question answering, when the intelligent question answering needs to be provided, a technician may send a request for providing the intelligent question answering at the computer device 110, where the request carries a request identifier, and the computer device 110 receives the request to obtain an intelligent question answering method based on machine reading understanding stored in the computer device 110. And then driving the dialogue management engine platform to complete the man-machine dialogue.
It should be noted that the terminal 120 and the computer device 110 may be, but not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like. The computer device 110 and the terminal 120 may be connected through bluetooth, USB (Universal Serial Bus), or other communication connection methods, which is not limited herein.
FIG. 2 is an internal block diagram of a computer device shown in accordance with an example embodiment. As shown in fig. 2, the computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected through a system bus. The non-volatile storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and when the computer readable instructions are executed by a processor, the processor can realize an intelligent question answering method based on machine reading understanding. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, cause the processor to perform a machine-read understanding-based intelligent question-answering method. The network interface of the computer device is used for connecting and communicating with the terminal. It will be appreciated by those skilled in the art that the configuration shown in fig. 2 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
The intelligent question-answering method based on machine reading understanding provided by the embodiment of the application will be described in detail below with reference to fig. 3 to 5. The method may be implemented in dependence on a computer program, executable on a data transmission device based on the von neumann architecture. The computer program may be integrated into the application or may run as a separate tool-like application.
Referring to fig. 3, a schematic flow chart of an intelligent question answering method based on machine reading understanding is provided for the embodiment of the present application, and as shown in fig. 3, the method of the embodiment of the present application may include the following steps:
s301 obtains a question text input by the user.
The intelligent question-answering method provided by the embodiment of the application can be applied to a plurality of artificial intelligent question-answering scenes, for example, can be applied to a customer service scene of the insurance industry, can be applied to a knowledge question-answering scene, can be applied to a traditional Chinese medicine inquiry session scene and the like, and the embodiment of the application is not particularly limited.
First, a question text input by a user is received, and in one possible implementation, voice information of the user is received, and then the user voice is converted into the question text based on a voice recognition technology. And receiving character information input by the user, and obtaining a problem text input by the user according to the received character information.
S302, searching the document associated with the question text, and screening the associated document based on a preset three-level screening strategy to obtain the screened associated document.
Alternatively, the knowledge graph may also be used to search for answers directly before finding documents associated with the question text.
Specifically, the principle of the knowledge graph is to establish the relationship among the entities by establishing a huge graph system, and the main processes of the knowledge graph comprise knowledge modeling, knowledge extraction, knowledge fusion, knowledge storage, knowledge calculation, knowledge application and the like. Once the knowledge graph is established, the system can easily extract the relationship among all the entities. Therefore, the domain knowledge graph can be established and stored in advance, for example, the knowledge graph of the traditional Chinese medicine domain can be established and stored, then the answer corresponding to the question text is searched from the preset knowledge graph, and if the answer corresponding to the question text is searched, the searched answer to the question is directly returned to the user. And if the answer corresponding to the question text cannot be found, searching the document associated with the question text.
Alternatively, the hundred degree knowledge map is a knowledge base used by a hundred degree company, which aggregates various resource information, and in one embodiment, a hundred degree search engine may be invoked to search for a hundred degree search based on a question input by a customer, and if an answer to the hundred degree search appears, the answer is directly used as the answer to the customer question.
By searching by using the knowledge graph, the answer can be directly and quickly obtained.
In one possible implementation, if the answer corresponding to the question text cannot be found according to the knowledge graph, the document associated with the question text is found.
Optionally, a preset word segmentation algorithm may be used to perform word segmentation on the problem text and extract the problem keywords, for example, a word segmentation method based on string matching, a word segmentation method based on understanding, a word segmentation method based on statistics, a word segmentation method based on rules, or a word segmentation method based on word tagging is used. And obtaining each word after word segmentation, and then extracting the problem keywords. For example, the question text is "help me to check diet contraindication of people with yin-deficiency constitution", and the obtained question keywords may be "yin-deficiency constitution" and "diet contraindication". Associated document searches may also be conducted based on the entire question text.
Further, searching is carried out based on the problem keywords and a preset first search engine, and a first preset number of associated documents arranged in the front are obtained. For example, the obtained question keywords are input into a hundred-degree knowledge to be searched, and the first 5 search contents are used as the associated documents which are searched out preliminarily. And searching based on the problem keywords and a preset second search engine to obtain a second preset number of associated documents arranged in front. For example, the obtained question keywords are input into a known search, and the first 5 search contents are used as the associated documents obtained by the initial search. The embodiment of the application does not specifically limit the preset search engines and the preset number, and can be set according to actual conditions.
By searching hundreds of known and known search engines and taking the searched documents arranged in front as the associated documents, the search range can be improved, and the accuracy of answers can be further improved.
And further, screening the associated documents based on a preset three-level screening strategy to obtain the screened associated documents.
Specifically, a first similarity between the problem text and the titles of the associated documents is calculated based on a preset word shift algorithm, and the associated documents with the first similarity lower than a preset threshold value are removed to obtain associated documents after primary screening.
After knowing and knowing a plurality of associated documents through hundreds of degrees, firstly, carrying out primary screening on the associated documents, calculating the first similarity between the problem text and the title of each associated document based on a word shifting algorithm, and removing the associated documents with the first similarity lower than 55% to obtain the associated documents after the primary screening. The value of the preset threshold is not specifically limited, and can be set according to actual conditions.
The word shift algorithm measures the similarity between texts by calculating the word shift distance. By "moving" the words contained in one document to words in another document, the minimum of the sum of the distances produced by this "moving" process serves as the word movement distance. The specific calculation method of the word shift distance can refer to the prior art.
Further, second similarity between the problem text and each relevant document after primary screening is calculated based on a preset BLEU4 algorithm, and the first two relevant documents with higher second similarity are used as relevant documents after secondary screening. The specific calculation method of the BLEU4 algorithm can refer to the prior art.
And further, preprocessing the associated document after the secondary screening, judging whether the associated document after the secondary screening contains a problem text, and if the associated document after the secondary screening contains the problem text, taking a document paragraph containing the problem text as the associated document after the tertiary screening.
If the relevant document after the secondary screening does not contain the problem text, acquiring a first sentence and a last sentence of each paragraph in the relevant document after the secondary screening to obtain a relevant sentence set, calculating a third similarity between the problem text and each sentence in the relevant sentence set based on a preset BLEU4 algorithm, combining the previous preset number of sentences with higher third similarities to obtain a combined paragraph, and taking the combined paragraph as the relevant document after the tertiary screening. For example, the first 5 sentences with higher third similarity are merged to obtain the related document after the three-level screening.
The reduction and the sequencing of the associated documents are completed by carrying out three-level screening on the massive text documents of the plurality of search engines, the data items, the data quantity and the span of the query are reduced, the query response speed is high, and the efficiency of the question and answer query is improved.
S303, inputting the question text and the screened associated document into a pre-trained machine reading model to obtain an output question answer.
In one embodiment, the machine reading model is first trained, and the machine reading model is an improved Bidaf model. As shown in fig. 5, the modified Bidaf model includes: the system comprises a question and associated document input layer, a word vector pre-training layer, a vector splicing layer, an attention-based fusion layer, an answer matching layer and an answer output layer which are connected in sequence.
Specifically, the question text and the screened associated documents are input into an input layer of the model, and each word is mapped to a vector space through a CNN network. Then, a glove word vector pre-training layer is input, and each word is mapped to a vector space. And then transmitting the word vectors into a layer of convolutional neural network, splicing with the word vectors, and finally realizing splicing of the word vectors and the word vectors through a highway network. And then inputting the result obtained by the highway network of the vector splicing layer into a fusion layer based on an attention mechanism, and obtaining a matrix fused with contents and problems through context2 queue and queue 2context processing, wherein context2 queue and queue 2context are the applied attention mechanism.
And then the fusion matrix is led into an answer matching layer, and matching is carried out through a Bi-layer LSTM. And finally, inputting an answer output layer, and predicting the answer by using a pointer network to obtain the starting position and the ending position of the answer.
By using the word vectors of the pre-training, the complexity of model training can be reduced, and the model training effect can be improved.
Inputting the question text and the screened associated documents into the trained machine reading model to obtain the question answers output by the model.
In an alternative embodiment, after the answer to the question output by the model is obtained, the answer to the question may be returned directly to the user.
In an optional embodiment, after the output answer to the question is obtained, obtaining word number information of the answer to the question output by the machine reading model, checking the answer to the question according to the word number information of the answer to the question and a preset checking rule, and if the check is passed, returning the answer to the question to the user.
Specifically, the number of words for limiting the answer may be set in advance, for example, the number of words for setting the answer is 50 or less, then the number of words for the answer output by the model is obtained, and if the number of words for the answer output by the model is 50 or less, the answer to the question that is successfully verified is returned to the user through verification. If the customer service does not pass the verification, the manual customer service can be switched. By setting the number of the answer words, abnormal answers can be filtered, and the accuracy of answer output is further improved.
In order to facilitate understanding of the intelligent question answering method provided by the embodiment of the present application, the following is further described with reference to fig. 4. As shown in fig. 4, the method specifically includes the following steps.
S401, acquiring a question text input by a user;
s402, searching answers corresponding to the question texts from a preset knowledge graph;
s403, judging whether a question answer is found from the knowledge graph or not, if the question answer is found, executing the step S411, and returning the question answer to the user; if the answer to the question is not found, executing step S404 to find a document associated with the question text;
s404, searching a document associated with the question text;
s405, screening the associated documents based on a preset three-level screening strategy to obtain screened associated documents;
s406, inputting the question text and the screened associated document into a pre-trained machine reading model to obtain an output question answer;
s407, acquiring the word number information of the output question answer;
s408, checking the answer to the question according to the word number information of the answer to the question and a preset checking rule;
s409, judging whether the verification passes; if the verification is passed, executing step S411, returning a question answer to the user, and if the verification is not passed, executing step S410, and transferring the manual customer service;
s410, transferring the artificial customer service;
s411 returns the answer to the question to the user.
According to the intelligent question-answering method based on the knowledge graph and the machine reading understanding model, answers can be searched through the knowledge graph, the answer inquiring efficiency is improved, if the answers cannot be searched through the knowledge graph, relevant documents are searched through a plurality of search engines, the search range comprises hundred-degree search, hundred-degree encyclopedia, hundred-degree knowledge, known answer and the like, and data are screened through a plurality of data preprocessing methods for the searched documents. For example, a document with a high degree of association with a user question is obtained based on a three-level screening strategy, and then the question and the associated document are imported into a machine reading understanding model to obtain a correct and effective answer, so that the accuracy of a machine reading understanding result is greatly improved.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Referring to fig. 6, a schematic structural diagram of an intelligent question answering device based on machine reading understanding according to an exemplary embodiment of the present application is shown. As shown in fig. 6, the intelligent question answering device based on machine reading understanding may be integrated in the computer device 110, and specifically may include an obtaining module 601, a filtering module 602, and an output module 603.
An obtaining module 601, configured to obtain a question text input by a user;
the screening module 602 is configured to search for a document associated with the question text, and screen the associated document based on a preset three-level screening policy to obtain a screened associated document;
the output module 603 is configured to input the question text and the filtered associated document into a pre-trained machine reading model, so as to obtain an output question answer.
In one embodiment, the system further includes a knowledge graph searching module, configured to search for an answer corresponding to the question text from a preset knowledge graph, return the searched question answer if the answer corresponding to the question text is found, and search for a document associated with the question text if the answer corresponding to the question text is not found.
In one embodiment, the screening module 602 is configured to perform word segmentation on the question text by using a preset word segmentation algorithm, and extract a question keyword;
searching based on the problem keywords and a preset first search engine to obtain a first preset number of associated documents arranged in the front;
and searching based on the problem keywords and a preset second search engine to obtain a second preset number of associated documents arranged in the front.
In one embodiment, the filtering module 602 is configured to calculate a first similarity between the question text and the title of the associated document based on a preset word shifting algorithm;
removing the associated documents with the first similarity lower than a preset threshold value to obtain associated documents after primary screening;
calculating a second similarity between the problem text and the associated document after the primary screening based on a preset BLEU4 algorithm;
taking the first two related documents with higher second similarity as related documents after secondary screening;
and preprocessing the associated documents after the second-level screening to obtain associated documents after the third-level screening.
In one embodiment, the filtering module 602 is configured to determine whether the associated document after the secondary filtering contains a question text;
if the relevant document after the secondary screening contains the problem text, taking a document paragraph containing the problem text as the relevant document after the tertiary screening;
if the relevant document after the secondary screening does not contain the problem text, acquiring a first sentence and a last sentence of each paragraph in the relevant document after the secondary screening to obtain a relevant sentence set;
calculating a third similarity between the problem text and each sentence in the associated sentence set based on a preset BLEU4 algorithm;
and combining the sentences with the higher third similarity in the preset number to obtain combined paragraphs, and taking the combined paragraphs as the related documents after the three-level screening.
In one embodiment, the pre-trained machine-reading model is a modified Bidaf model;
the improved Bidaf model comprises the following steps: the system comprises a question and associated document input layer, a word vector pre-training layer, a vector splicing layer, an attention-based fusion layer, an answer matching layer and an answer output layer which are connected in sequence.
In one embodiment, the system further comprises a verification module for acquiring word number information of the answer to the question output by the machine reading model;
checking the answer to the question according to the word number information of the answer to the question and a preset checking rule;
and if the verification is passed, returning a question answer to the user.
It should be noted that, when the intelligent question-answering device based on machine reading understanding provided by the above embodiment executes the intelligent question-answering method based on machine reading understanding, the division of the above functional modules is only used for illustration, and in practical application, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the above described functions. In addition, the intelligent question-answering device based on machine reading understanding and the intelligent question-answering method based on machine reading understanding provided by the embodiments belong to the same concept, details of implementation processes are shown in the method embodiments, and are not described herein.
In one embodiment, a computer device is proposed, the computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program: acquiring a question text input by a user; searching a document associated with the problem text, and screening the associated document based on a preset three-level screening strategy to obtain a screened associated document; and inputting the question text and the screened associated document into a pre-trained machine reading model to obtain an output question answer.
In one embodiment, after obtaining the question text input by the user, the method further includes:
searching answers corresponding to the question texts from a preset knowledge graph;
if the answer corresponding to the question text is found, returning the found question answer;
and if the answer corresponding to the question text cannot be found, searching the document associated with the question text.
In one embodiment, finding a document associated with the question text comprises:
performing word segmentation processing on the problem text by adopting a preset word segmentation algorithm, and extracting problem keywords;
searching based on the problem keywords and a preset first search engine to obtain a first preset number of associated documents arranged in the front;
and searching based on the problem keywords and a preset second search engine to obtain a second preset number of associated documents arranged in front.
In one embodiment, screening the associated documents based on a preset tertiary screening policy to obtain screened associated documents includes:
calculating a first similarity between the question text and the title of the associated document based on a preset word shift algorithm;
removing the associated documents with the first similarity lower than a preset threshold value to obtain associated documents after primary screening;
calculating a second similarity between the problem text and the associated document after the primary screening based on a preset BLEU4 algorithm;
taking the first two related documents with higher second similarity as related documents after secondary screening;
and preprocessing the associated document after the secondary screening to obtain the associated document after the tertiary screening.
In one embodiment, the preprocessing the associated document after the secondary screening to obtain an associated document after the tertiary screening includes:
judging whether the associated document after the secondary screening contains a problem text;
if the relevant document after the secondary screening contains the problem text, taking a document paragraph containing the problem text as the relevant document after the tertiary screening;
if the relevant document after the secondary screening does not contain the problem text, acquiring a first sentence and a last sentence of each paragraph in the relevant document after the secondary screening to obtain a relevant sentence set;
calculating a third similarity between the problem text and each sentence in the associated sentence set based on a preset BLEU4 algorithm;
and combining the sentences with the higher third similarity in the preset number to obtain combined paragraphs, and taking the combined paragraphs as the related documents after the three-level screening.
In one embodiment, the pre-trained machine-reading model is an improved Bidaf model;
the improved Bidaf model comprises the following steps: the system comprises a question and associated document input layer, a word vector pre-training layer, a vector splicing layer, an attention-based fusion layer, an answer matching layer and an answer output layer which are connected in sequence.
In one embodiment, after obtaining the output answer to the question, the method further includes:
acquiring word number information of a question answer output by a machine reading model;
checking the answer to the question according to the word number information of the answer to the question and a preset checking rule;
and if the verification is passed, returning a question answer to the user.
In one embodiment, a storage medium is provided that stores computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of: acquiring a question text input by a user; searching a document associated with the problem text, and screening the associated document based on a preset three-level screening strategy to obtain a screened associated document; and inputting the question text and the screened associated documents into a pre-trained machine reading model to obtain output question answers.
In one embodiment, after the obtaining of the question text input by the user, the method further includes:
searching answers corresponding to the question texts from a preset knowledge graph;
if the answer corresponding to the question text is found, returning the found question answer;
and if the answer corresponding to the question text cannot be found, searching the document associated with the question text.
In one embodiment, finding a document associated with the question text comprises:
performing word segmentation processing on the problem text by adopting a preset word segmentation algorithm, and extracting problem keywords;
searching based on the problem keywords and a preset first search engine to obtain a first preset number of associated documents arranged in the front;
and searching based on the problem keywords and a preset second search engine to obtain a second preset number of associated documents arranged in the front.
In one embodiment, screening the associated documents based on a preset tertiary screening policy to obtain screened associated documents includes:
calculating a first similarity between the question text and the title of the associated document based on a preset word shifting algorithm;
removing the associated documents with the first similarity lower than a preset threshold value to obtain associated documents after primary screening;
calculating a second similarity between the problem text and the associated document after the primary screening based on a preset BLEU4 algorithm;
taking the first two associated documents with higher second similarity as the associated documents after the secondary screening;
and preprocessing the associated document after the secondary screening to obtain the associated document after the tertiary screening.
In one embodiment, the preprocessing the associated document after the secondary screening to obtain an associated document after the tertiary screening includes:
judging whether the associated documents after the secondary screening contain problem texts;
if the relevant document after the secondary screening contains the problem text, taking the document paragraph containing the problem text as the relevant document after the tertiary screening;
if the relevant document after the secondary screening does not contain the problem text, acquiring a first sentence and a last sentence of each paragraph in the relevant document after the secondary screening to obtain a relevant sentence set;
calculating a third similarity between the problem text and each sentence in the associated sentence set based on a preset BLEU4 algorithm;
and combining the sentences with the higher third similarity in the preset number to obtain combined paragraphs, and taking the combined paragraphs as the related documents after the three-level screening.
In one embodiment, the pre-trained machine-reading model is an improved Bidaf model;
the improved Bidaf model comprises the following steps: the system comprises a question and associated document input layer, a word vector pre-training layer, a vector splicing layer, an attention-based fusion layer, an answer matching layer and an answer output layer which are connected in sequence.
In one embodiment, after obtaining the output answer to the question, the method further includes:
acquiring word number information of a question answer output by a machine reading model;
checking the answer to the question according to the word number information of the answer to the question and a preset checking rule;
and if the verification is passed, returning a question answer to the user.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program, which may be stored in a computer readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An intelligent question-answering method based on machine reading understanding, which is characterized by comprising the following steps:
acquiring a question text input by a user;
searching a document associated with the problem text, and screening the associated document based on a preset three-level screening strategy to obtain a screened associated document;
and inputting the question text and the screened associated document into a pre-trained machine reading model to obtain an output question answer.
2. The method of claim 1, after obtaining the question text input by the user, further comprising:
searching answers corresponding to the question texts from a preset knowledge graph;
if the answer corresponding to the question text is found, returning the found question answer;
and if the answer corresponding to the question text cannot be found, searching a document associated with the question text.
3. The method of claim 1, wherein finding a document associated with the question text comprises:
performing word segmentation processing on the problem text by adopting a preset word segmentation algorithm, and extracting problem keywords;
searching based on the problem keywords and a preset first search engine to obtain a first preset number of associated documents arranged in front;
and searching based on the problem keywords and a preset second search engine to obtain a second preset number of associated documents arranged in front.
4. The method of claim 3, wherein screening the associated documents based on a preset tertiary screening policy to obtain screened associated documents comprises:
calculating a first similarity between the question text and the title of the associated document based on a preset word shifting algorithm;
removing the associated documents with the first similarity lower than a preset threshold value to obtain associated documents after primary screening;
calculating a second similarity between the problem text and the primary screened associated document based on a preset BLEU4 algorithm;
taking the first two associated documents with higher second similarity as associated documents after secondary screening;
and preprocessing the associated documents after the secondary screening to obtain associated documents after the tertiary screening.
5. The method of claim 4, wherein preprocessing the secondary filtered associated document to obtain a tertiary filtered associated document comprises:
judging whether the relevant documents after the secondary screening contain the question texts;
if the relevant document after the secondary screening comprises the question text, taking a document paragraph containing the question text as the relevant document after the tertiary screening;
if the relevant document after the secondary screening does not contain the question text, acquiring a first sentence and a last sentence of each paragraph in the relevant document after the secondary screening to obtain a relevant sentence set;
calculating a third similarity between the problem text and each sentence in the associated sentence set based on a preset BLEU4 algorithm;
and combining the sentences with the higher third similarity in the preset number to obtain combined paragraphs, and taking the combined paragraphs as the related documents after the three-level screening.
6. The method of claim 1, wherein the pre-trained machine-reading model is a modified Bidaf model;
the improved Bidaf model comprises the following steps: the system comprises a question and associated document input layer, a word vector pre-training layer, a vector splicing layer, an attention-based fusion layer, an answer matching layer and an answer output layer which are connected in sequence.
7. The method of claim 1, after obtaining the output answers to the questions, further comprising:
acquiring word number information of the question answer output by the machine reading model;
checking the question answers according to the word number information of the question answers and a preset checking rule;
and if the verification is passed, returning the question answer to the user.
8. An intelligent question-answering device based on machine reading understanding, the device comprising:
the acquisition module is used for acquiring a question text input by a user;
the screening module is used for searching the document associated with the problem text and screening the associated document based on a preset three-level screening strategy to obtain a screened associated document;
and the output module is used for inputting the question text and the screened associated documents into a pre-trained machine reading model to obtain output question answers.
9. A computer device comprising a memory and a processor, the memory having stored therein computer-readable instructions that, when executed by the processor, cause the processor to perform the machine-read understanding-based intelligent question answering method according to any one of claims 1 to 7.
10. A storage medium storing computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform the machine-reading-understanding-based intelligent question answering method according to any one of claims 1 to 7.
CN202210415733.6A 2022-04-20 2022-04-20 Intelligent question-answering method, device, equipment and medium based on machine reading understanding Pending CN114780700A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210415733.6A CN114780700A (en) 2022-04-20 2022-04-20 Intelligent question-answering method, device, equipment and medium based on machine reading understanding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210415733.6A CN114780700A (en) 2022-04-20 2022-04-20 Intelligent question-answering method, device, equipment and medium based on machine reading understanding

Publications (1)

Publication Number Publication Date
CN114780700A true CN114780700A (en) 2022-07-22

Family

ID=82430820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210415733.6A Pending CN114780700A (en) 2022-04-20 2022-04-20 Intelligent question-answering method, device, equipment and medium based on machine reading understanding

Country Status (1)

Country Link
CN (1) CN114780700A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117131283A (en) * 2023-10-27 2023-11-28 知学云(北京)科技股份有限公司 Intelligent question-answering method and system based on asynchronous service

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117131283A (en) * 2023-10-27 2023-11-28 知学云(北京)科技股份有限公司 Intelligent question-answering method and system based on asynchronous service
CN117131283B (en) * 2023-10-27 2024-03-19 知学云(北京)科技股份有限公司 Intelligent question-answering method and system based on asynchronous service

Similar Documents

Publication Publication Date Title
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
CN110399457B (en) Intelligent question answering method and system
CN111222305B (en) Information structuring method and device
CN110929038B (en) Knowledge graph-based entity linking method, device, equipment and storage medium
CN108280114B (en) Deep learning-based user literature reading interest analysis method
CN111444344B (en) Entity classification method, entity classification device, computer equipment and storage medium
CN112800170A (en) Question matching method and device and question reply method and device
CN112035599B (en) Query method and device based on vertical search, computer equipment and storage medium
CN113614711A (en) Embedded based image search retrieval
CN112115232A (en) Data error correction method and device and server
CN111274822A (en) Semantic matching method, device, equipment and storage medium
CN111401065A (en) Entity identification method, device, equipment and storage medium
CN112307182A (en) Question-answering system-based pseudo-correlation feedback extended query method
CN112613321A (en) Method and system for extracting entity attribute information in text
CN115374781A (en) Text data information mining method, device and equipment
CN110795544B (en) Content searching method, device, equipment and storage medium
CN114037007A (en) Data set construction method and device, computer equipment and storage medium
CN114491079A (en) Knowledge graph construction and query method, device, equipment and medium
CN114780700A (en) Intelligent question-answering method, device, equipment and medium based on machine reading understanding
CN116821307B (en) Content interaction method, device, electronic equipment and storage medium
CN111783425B (en) Intention identification method based on syntactic analysis model and related device
CN113297355A (en) Method, device, equipment and medium for enhancing labeled data based on countermeasure interpolation sequence
CN113610626A (en) Bank credit risk identification knowledge graph construction method and device, computer equipment and computer readable storage medium
CN117216214A (en) Question and answer extraction generation method, device, equipment and medium
CN117093686A (en) Intelligent question-answer matching method, device, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination