CN110851484A - Method and device for obtaining multi-index question answers - Google Patents

Method and device for obtaining multi-index question answers Download PDF

Info

Publication number
CN110851484A
CN110851484A CN201911106796.8A CN201911106796A CN110851484A CN 110851484 A CN110851484 A CN 110851484A CN 201911106796 A CN201911106796 A CN 201911106796A CN 110851484 A CN110851484 A CN 110851484A
Authority
CN
China
Prior art keywords
determining
word segmentation
problem information
model
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911106796.8A
Other languages
Chinese (zh)
Inventor
孙子钧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shannon Huiyu Technology Co Ltd
Original Assignee
Beijing Shannon Huiyu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shannon Huiyu Technology Co Ltd filed Critical Beijing Shannon Huiyu Technology Co Ltd
Priority to CN201911106796.8A priority Critical patent/CN110851484A/en
Publication of CN110851484A publication Critical patent/CN110851484A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for acquiring answers to multi-index questions, wherein the method comprises the following steps: acquiring question information input by a user; performing word segmentation processing on the problem information based on the multi-mode model, determining a word segmentation result, and extracting a plurality of indexes in the problem information; establishing a dependency relationship between words according to the word segmentation result, and converting the problem information into a query statement in a machine language form corresponding to each index according to the dependency relationship; and determining the query result corresponding to each index, and displaying the query results of all the indexes in the same coordinate system. By the method and the device for acquiring the multi-index answers to the questions, provided by the embodiment of the invention, semantic analysis can be performed more accurately based on multiple modes, so that word segmentation results are more accurate; the semantic information of the sentence can be completely and comprehensively described based on the dependency relationship between the words, so that the query sentence is more accurate, and the query accuracy is improved.

Description

Method and device for obtaining multi-index question answers
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method and a device for acquiring answers to multi-index questions.
Background
At present, the mainstream financial problem searching method is a database retrieval technology based on keyword matching. The database stores massive fields and data, when a user asks a question, the user uses a traditional word segmentation algorithm to extract a keyword of the question, and then enters the database for query according to the keyword to find a result.
The following problems and disadvantages mainly exist in the search technology:
the result based on keyword matching shows a large number of files containing keywords, and manual reading and screening from the answers are needed, so that the efficiency is low. And based on keyword matching, the questions cannot be understood, accurate answers are difficult to show, and returned results often only have relevance but cannot answer the questions exactly. In addition, the conventional search method generally supports the search of only one index, and the search effect is slightly poor when a plurality of indexes need to be searched simultaneously.
Disclosure of Invention
To solve the above problems, embodiments of the present invention provide a method and an apparatus for obtaining answers to multi-index questions.
In a first aspect, an embodiment of the present invention provides a method for obtaining answers to a multi-index question, including:
acquiring question information input by a user, wherein the question information comprises a plurality of indexes with the same attribute;
performing word segmentation processing on the problem information based on a multi-mode model, determining a word segmentation result, and extracting a plurality of indexes in the problem information, wherein the multi-mode model comprises at least two items of a word model, a character model, a pinyin model and a character pattern model;
establishing a dependency relationship between words according to the word segmentation result, and converting the problem information into a query statement in a machine language form corresponding to each index according to the dependency relationship;
and querying a corresponding database according to the query statement, determining a query result corresponding to each index, and displaying the query results of all the indexes in the same coordinate system.
In a second aspect, an embodiment of the present invention further provides an apparatus for obtaining answers to multiple index questions, including:
the system comprises a problem acquisition module, a problem analysis module and a problem analysis module, wherein the problem acquisition module is used for acquiring problem information input by a user, and the problem information comprises a plurality of indexes with the same attribute;
the word segmentation module is used for performing word segmentation processing on the problem information based on a multi-mode model, determining a word segmentation result and extracting a plurality of indexes in the problem information, wherein the multi-mode model comprises at least two items of a word model, a character model, a pinyin model and a character pattern model;
the processing module is used for establishing the dependency relationship between words according to the word segmentation result and converting the problem information into the query statement in the form of the machine language corresponding to each index according to the dependency relationship;
and the query display module is used for querying the corresponding database according to the query statement, determining the query result corresponding to each index, and displaying the query results of all the indexes in the same coordinate system.
In the solution provided by the first aspect of the embodiments of the present invention, the multi-modal model is used to perform word segmentation on the problem information input by the user, so as to extract multiple indexes with the same attribute and establish the dependency relationship between words; and converting the problem information into a query statement in a machine language form according to the dependency relationship, thereby quickly determining the query result of each index by using the query statement and simultaneously displaying the query results of a plurality of indexes. The method carries out word segmentation processing based on a multi-mode model, and can carry out semantic analysis more accurately based on a multi-mode model, so that word segmentation results are more accurate; the semantic information of the sentence can be completely and comprehensively described based on the dependency relationship between the words, so that the query sentence is more accurate, more accurate results can be queried, and the query accuracy is improved; by extracting indexes with the same attribute and converting the problem information into the query statement corresponding to each index, the original problem information can be simplified, and the query result corresponding to each index can be queried accurately; and simultaneously displaying the query results of all indexes, and facilitating the comparison of the query results among the indexes by the user.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating a method for obtaining answers to a multi-index question according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a specific method for performing word segmentation processing based on a multi-modal model in the method for obtaining answers to multi-index questions according to the embodiment of the present invention;
FIG. 3 is a schematic diagram of a bidirectional long-short memory recurrent neural network model according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a Stack-LSTM based syntactic dependency tree model provided by an embodiment of the present invention;
FIG. 5 is a diagram illustrating a display of query results provided by an embodiment of the invention;
fig. 6 is a schematic structural diagram illustrating an apparatus for obtaining answers to a multi-index question according to an embodiment of the present invention.
Detailed Description
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", and the like, indicate orientations and positional relationships based on those shown in the drawings, and are used only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be considered as limiting the present invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
In the present invention, unless otherwise expressly specified or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
The embodiment of the invention provides a method for acquiring answers to multi-index questions, which is used for inquiring answers to the multi-index questions. Referring to fig. 1, the method includes:
step 101: problem information input by a user is obtained, and the problem information comprises a plurality of indexes with the same attribute.
In the embodiment of the present invention, the "question information" refers to information input by a user when the user needs to query, and the question information does not need to be in a question form. For example, when a user needs to query the chinese GDP, the user may input only the "chinese GDP" or may input what the "chinese GDP" is in the form of a question. Meanwhile, the issue information includes one or more indices. In the present embodiment, the "index" is the content included in the question information; for example, if the question information is "GDP in china", the question information includes two indexes "china" and "GDP". Meanwhile, the index has corresponding attributes, which include semantic attributes (such as name of a person and name of a place), grammatical attributes (such as subject and adjective), etc., which are not limited in this embodiment. If the problem information comprises a plurality of indexes, and part or all of the indexes have the same attribute, the problem information is a multi-index problem, and when a user needs to query the multi-index problem, the query process is suitable for the embodiment of the invention. For example, the question information is "GDP in china, the united states, and japan" which includes four indexes, i.e., "china", "united states", "japan", and "GDP", and "china", "united states", and "japan" are country names, and these three indexes have the same attribute, the question is a multi-index question. Or, the problem information is "average human GDP and average human steel consumption in china", and the problem information includes three indexes of "china", "average human GDP" and "average human steel consumption", where "average human GDP" and "average human steel consumption" are both terms defined by "china", and then "average human GDP" and "average human steel consumption" can be regarded as two indexes having the same attribute.
Step 102: performing word segmentation processing on the problem information based on a multi-mode model, determining a word segmentation result, and extracting a plurality of indexes in the problem information, wherein the multi-mode model comprises at least two items of a word model, a character model, a pinyin model and a character pattern model.
In the embodiment of the invention, the multi-mode model is generated based on the words, the characters, the pinyin and the character patterns, and the natural language processing is carried out through multiple dimensions, so that the method has higher accuracy compared with the traditional processing mode only based on the characters. Specifically, "word" refers to a word segmentation determined based on a conventional word segmentation model; "character" refers to basic information in a language, such as a Chinese character is a character, etc.; the pinyin is the specific attribute of the Chinese characters, and the pronunciation of each Chinese character also contains semantic information thereof to a certain extent, such as polyphones and the like; "glyph" refers to a character pattern belonging to the class of pictographic characters, such as Chinese characters, and the shape of each Chinese character may also contain specific semantics. For example, the original text is "i am a chinese", and can be classified as "i/is/a/chinese" after being processed based on the "word"; after being processed based on the character, the method can be divided into 'I/Y/I/M/China/man'; after being processed based on the pinyin, the Chinese character can be divided into 'wo/shi/yi/ge/zhong/guo/ren'; each character can be mapped into a Chinese character picture based on the font, and then corresponding processing is carried out.
Wherein, one model included in the multi-modal model is used for performing semantic analysis processing based on corresponding parameters, for example, a "word model" in the multi-modal model is used for performing semantic analysis based on a "word"; and finally, finally determining the most appropriate and accurate word segmentation result based on all models (such as word models, character models, pinyin models and font models) in the multi-modal model.
Meanwhile, the attribute of each participle can be determined according to the participle processing process and the participle result, and the participles are determined to have the same attribute; when some participles are determined to have the same attribute, the participles can be used as indexes with the same attribute. Typically, an index contains one or more tokens. For example, the index "GDP" includes a word "GDP", and the index "per capita steel consumption" may include three words "per capita", "steel", and "consumption". The number of the word segmentation included in the index is determined according to the actual situation and the word segmentation mode. For example, "per capita steel consumption amount" may be used as a word, and this case corresponds to an index.
Step 103: and establishing a dependency relationship between words according to the word segmentation result, and converting the problem information into a query statement in a machine language form corresponding to each index according to the dependency relationship.
In the embodiment of the invention, the Dependency relationship between words can be specifically established based on a deep learning model, and the syntactic structure of the problem information, namely Dependency syntax (Dependency Parsing), can be revealed through the Dependency relationship, so that the problem information can be analyzed into a Dependency relationship syntactic tree, and the natural language is translated into the query sentence which can be understood by a machine. Meanwhile, because the problem information includes a plurality of indexes having the same attribute, the query statement can be generated by taking the index as a unit, that is, one index corresponds to one query statement. For example, the question information is "GDP in china, the united states, and japan", and is converted into three query sentences, which are "GDP in china", "GDP in the united states", and "GDP in japan", respectively, in the present embodiment.
Step 104: and querying a corresponding database according to the query statement, determining a query result corresponding to each index, and displaying the query results of all the indexes in the same coordinate system.
In the embodiment of the invention, a corresponding database is preset for a user to inquire; specifically, for financial problems, the financial text may be obtained in various ways (such as web crawling, etc.), and then a database related to financial data is generated. And querying the database after determining the query statement, and further extracting and displaying a query result corresponding to each index for a user to look up. Meanwhile, all the query results of the indexes with the same attribute are displayed in the same coordinate system, so that the user can conveniently compare the query results among the indexes. The same coordinate system means that the horizontal axis and the vertical axis of the coordinate system are the same.
According to the method for obtaining the multi-index answer to the question, provided by the embodiment of the invention, the multi-mode model is used for carrying out word segmentation on the question information input by the user, a plurality of indexes with the same attribute can be extracted, and the dependency relationship between words is established; and converting the problem information into a query statement in a machine language form according to the dependency relationship, thereby quickly determining the query result of each index by using the query statement and simultaneously displaying the query results of a plurality of indexes. The method carries out word segmentation processing based on a multi-mode model, and can carry out semantic analysis more accurately based on a multi-mode model, so that word segmentation results are more accurate; the semantic information of the sentence can be completely and comprehensively described based on the dependency relationship between the words, so that the query sentence is more accurate, more accurate results can be queried, and the query accuracy is improved; by extracting indexes with the same attribute and converting the problem information into the query statement corresponding to each index, the original problem information can be simplified, and the query result corresponding to each index can be queried accurately; and simultaneously displaying the query results of all indexes, and facilitating the comparison of the query results among the indexes by the user.
On the basis of the above embodiment, referring to fig. 2, the step 102 "performing word segmentation processing on the question information based on the multi-modal model" includes steps 1021-1025:
step 1021: and determining an initial word segmentation result through a preset word segmentation model, and determining a first semantic meaning of the problem information by taking the word as a basic unit.
In the embodiment of the invention, the word segmentation model can adopt the existing word segmentation model, such as a Chinese word segmentation device. Based on the initial word segmentation result, a word model is established by taking words as basic units, the word model can be specifically a long-short memory neural network model (LSTM), and the semantics of the problem information can be determined based on the word model. For example, for the sentence "i am a chinese", the input of the word model is "i/is/a/chinese", and corresponding semantics are output.
Step 1022: all characters of the question information are determined, and the second semantic meaning of the question information is determined by taking the characters as basic units.
The traditional word segmentation model only takes words as language units, and the Chinese semantics on a character level are ignored by the model; in the embodiment of the invention, a character model is established by taking characters as basic units, and the character model can also adopt a long and short memory neural network model; the semantics at the sentence level can be processed based on the character model. For example, for the sentence "i am a chinese", the input of the character model is "i/is/one/middle/country/person".
Step 1023: determining pinyin corresponding to each character, determining pinyin vectors of each pinyin, determining first character vectors corresponding to the pinyin vectors through a convolutional neural network, and further determining third semantics of problem information according to the first character vectors by taking the characters as basic units.
Because Chinese characters have phonetic attributes, namely the pronunciation of each character contains semantic information to a certain extent, in the embodiment of the invention, each Chinese character is mapped into Chinese pinyin, each pinyin character is represented by a vector, then a character vector, namely a first character vector, is obtained based on the pinyin vector through a Convolutional Neural Network (CNN), and then the semantics of the characters are combined into the semantics of sentences through another layer of long-short memory neural network (LSTM). For example, for the sentence "I am a Chinese," the input is "wo/shi/yi/ge/zhong/guo/ren".
Step 1024: and generating a corresponding font image for each character, converting the font image into a corresponding second character vector, and determining a fourth semantic meaning of the problem information according to the second character vector by taking the character as a basic unit.
In the embodiment of the invention, because the Chinese characters belong to pictographic characters, and the shape of each Chinese character contains rich semantics, a character pattern model is added, each Chinese character is regarded as a picture, and each character pattern picture is changed into a vector by a convolutional neural network in machine vision. Thus, the graphic meaning of the Chinese character is covered, and then the semantics of the character are combined into the semantics of the sentence through another layer of long-short memory neural network (LSTM).
Step 1025: comprehensively determining semantic information of the problem information according to the first semantic, the second semantic, the third semantic and the fourth semantic, performing word segmentation processing on the problem information according to the semantic information, determining a final word segmentation result, and extracting a plurality of indexes with the same attribute in the problem information.
In the embodiment of the invention, comprehensive semantics are determined based on word, character, pinyin and character multi-mode Chinese natural language processing models, and finally the importance or weight of four different models is determined by using a neural network system based on attention (attention) to determine a final processing result. For the whole multi-modal model, the first semantic meaning, the second semantic meaning, the third semantic meaning and the fourth semantic meaning are intermediate processing results and can not be displayed to a user, namely, the multi-modal model converts problem information into corresponding words, characters, pinyin and characters and then serves as model input, and further a final word segmentation result can be obtained. Meanwhile, the attribute of each participle can be determined according to the participle processing process and the participle result, and the participles are determined to have the same attribute; when some participles are determined to have the same attribute, the participles can be used as indexes with the same attribute.
On the basis of the above embodiment, after the step 102 "performing word segmentation processing on the question information based on the multi-modal model", the method further includes:
and performing semantic understanding processing on the problem information according to the word segmentation result, judging whether the problem information needs to be rewritten according to the semantic understanding processing result, and correcting the problem information when the problem information needs to be rewritten.
In the embodiment of the invention, since some financial problems relate to professional terms, if the user needs to manually input an accurate problem, the requirement on the professional level of the user is high, and time and labor are wasted, the problem information is corrected based on semantic understanding processing in the embodiment, so that the subsequently generated query statement is more accurate. For example, if the question information input by the user is "ten-fold ten-shares of ten years", the question may be rewritten to "stock that has been increased by ten-fold in the number of rewarding quotation prices before the past ten years", or the like.
On the basis of the above embodiment, performing word segmentation processing on the question information based on the multi-modal model includes:
and performing word segmentation processing on the problem information based on a multi-mode model, and performing part-of-speech tagging on the segmented words based on a preset bidirectional long and short memory recurrent neural network model.
In the embodiment of the invention, word segmentation is carried out on the problem information, and part of speech tagging is carried out, and the word segmentation can be particularly carried out on the basis of a preset bidirectional long and short memory recurrent neural network model. In this embodiment, a Bi-Directional Language Model (Bi-Directional Language Model) is trained in advance based on a chinese corpus, and then the trained Language Model is used to initialize the word vectors. After the word vector is initialized, the word vector of each word site is obtained by using a bidirectional long and short memory recurrent neural network model, and the word vector is used as the input of a classification model to determine the part-of-speech mark of each word. For example, if "i is a Chinese," the process of tagging parts of speech is shown in fig. 3.
On the basis of the above embodiment, the process of establishing the dependency relationship between words in step 103 may be specifically implemented based on a shift-reduce algorithm of a stack-neural network, and in each step, the algorithm determines whether the next action is shift or reduce by using one classifier. The algorithm models characters of which a syntax tree is built (stack LSTM) and characters of which the syntax tree is not built (queue LSTM) by using two long and short memory neural networks. A Stack-LSTM based syntactic dependency tree model is shown in FIG. 4. In step 103, when generating the query sentence, the bi-directional language model may be trained based on the chinese corpus in advance, and then the word vector may be initialized by using the trained language model. After the word vector is initialized, the word vector of each word site is obtained by using a bidirectional long and short memory recurrent neural network model, and the word vector is used as the input of a classification model to determine the part-of-speech mark of each word. And then establishing a dependency relationship between words based on a shift-reduce algorithm of a stack-neural network, and converting the problem information into a query statement in a machine language form corresponding to each index according to the dependency relationship.
On the basis of the above embodiment, the step 104 "displaying the query results of all the indexes under the same coordinate system" includes:
determining parameters required to be displayed by each index in the problem information, and determining a display mode corresponding to the parameters, wherein the display mode comprises one or more of a curve graph, a bar graph, a pie graph and a table; and displaying the parameters corresponding to each index in the display mode.
In the embodiment of the invention, the semantic analysis is carried out on the problem information to determine the parameters needing to be displayed. In general, the parameter to be displayed is the last indicator in the question information. For example, the question information is "GDP in china, the united states, and japan", and for the question information, the parameter that needs to be displayed is "GDP". Meanwhile, the parameter may be displayed in a plurality of display modes, and one or more display modes may be selected in this embodiment. Preferably, one parameter selects a display mode, and the parameters corresponding to the indexes with the same attribute are displayed in the same display mode, so that a user can conveniently compare and view the parameters. Taking the user query "GDP in china, usa and japan" as an example, one display manner of displaying the query result is shown in fig. 5. In fig. 5, curve 1 represents the GDP in the united states, curve 2 represents the GDP in china, and curve 3 represents the GDP in japan.
Optionally, a change rate (such as a proportional increase rate) of the parameter of each index may also be determined, and the parameter change rate is displayed simultaneously with the display of the parameter. Wherein the rate of change of the parameter is typically displayed in the form of a line graph.
According to the method for obtaining the multi-index answer to the question, provided by the embodiment of the invention, the multi-mode model is used for carrying out word segmentation on the question information input by the user, a plurality of indexes with the same attribute can be extracted, and the dependency relationship between words is established; and converting the problem information into a query statement in a machine language form according to the dependency relationship, thereby quickly determining the query result of each index by using the query statement and simultaneously displaying the query results of a plurality of indexes. The method carries out word segmentation processing based on a multi-mode model, and can carry out semantic analysis more accurately based on a multi-mode model, so that word segmentation results are more accurate; the semantic information of the sentence can be completely and comprehensively described based on the dependency relationship between the words, so that the query sentence is more accurate, more accurate results can be queried, and the query accuracy is improved; by extracting indexes with the same attribute and converting the problem information into the query statement corresponding to each index, the original problem information can be simplified, and the query result corresponding to each index can be queried accurately; and simultaneously displaying the query results of all indexes, and facilitating the comparison of the query results among the indexes by the user. The question is rewritten based on semantics as necessary to generate a more accurate query statement. And determining a display mode based on the index, displaying the change rate of the data, and facilitating the user to check the query result.
The above describes in detail the method flow for obtaining answers to multi-index questions, and the method can also be implemented by corresponding devices, and the structure and function of the device are described in detail below.
The apparatus for obtaining answers to multi-index questions provided in the embodiment of the present invention, as shown in fig. 6, includes:
the question acquiring module 61 is used for acquiring question information input by a user, and the question information comprises a plurality of indexes with the same attribute;
the word segmentation module 62 is configured to perform word segmentation processing on the problem information based on a multi-modal model, determine a word segmentation result, and extract multiple indexes in the problem information, where the multi-modal model includes at least two of a word model, a character model, a pinyin model, and a font model;
the processing module 63 is configured to establish a dependency relationship between words according to the word segmentation result, and convert the problem information into a query statement in a machine language form corresponding to each index according to the dependency relationship;
and the query display module 64 is configured to query the corresponding database according to the query statement, determine a query result corresponding to each index, and display the query results of all the indexes in the same coordinate system.
On the basis of the above embodiment, the word segmentation module 62 includes:
the word processing unit is used for determining an initial word segmentation result through a preset word segmentation model and determining a first semantic meaning of the problem information by taking a word as a basic unit;
the character processing unit is used for determining all characters of the problem information and determining a second semantic meaning of the problem information by taking the characters as a basic unit;
the pinyin processing unit is used for determining pinyin corresponding to each character, determining a pinyin vector of each pinyin, determining a first character vector corresponding to the pinyin vector through a convolutional neural network, and further determining a third semantic meaning of problem information according to the first character vector by taking the character as a basic unit;
the font processing unit is used for generating a corresponding font picture for each character, converting the font picture into a corresponding second character vector, and further determining a fourth semantic meaning of the problem information according to the second character vector by taking the character as a basic unit;
the multi-mode word segmentation unit is used for comprehensively determining semantic information of the problem information according to the first semantic, the second semantic, the third semantic and the fourth semantic, performing word segmentation processing on the problem information according to the semantic information, determining a final word segmentation result, and extracting a plurality of indexes with the same attribute in the problem information.
On the basis of the above embodiment, the apparatus further comprises a rewriting module;
after the word segmentation module 62 performs word segmentation processing on the problem information based on the multi-modal model, the rewriting module is configured to perform semantic understanding processing on the problem information according to a word segmentation result, determine whether the problem information needs to be rewritten according to a semantic understanding processing result, and correct the problem information when rewriting is needed.
On the basis of the above embodiment, the word segmentation module 62 is configured to:
and performing word segmentation processing on the problem information based on a multi-mode model, and performing part-of-speech tagging on the segmented words based on a preset bidirectional long and short memory recurrent neural network model.
On the basis of the above embodiment, the query display module 64 is configured to:
determining parameters required to be displayed by each index in the problem information, and determining a display mode corresponding to the parameters, wherein the display mode comprises one or more of a curve graph, a bar graph, a pie graph and a table;
and displaying the parameters corresponding to each index in the display mode.
According to the device for obtaining the multi-index answer to the question, provided by the embodiment of the invention, the multi-mode model is used for carrying out word segmentation on the question information input by the user, a plurality of indexes with the same attribute can be extracted, and the dependency relationship between words is established; and converting the problem information into a query statement in a machine language form according to the dependency relationship, thereby quickly determining the query result of each index by using the query statement and simultaneously displaying the query results of a plurality of indexes. The method carries out word segmentation processing based on a multi-mode model, and can carry out semantic analysis more accurately based on a multi-mode model, so that word segmentation results are more accurate; the semantic information of the sentence can be completely and comprehensively described based on the dependency relationship between the words, so that the query sentence is more accurate, more accurate results can be queried, and the query accuracy is improved; by extracting indexes with the same attribute and converting the problem information into the query statement corresponding to each index, the original problem information can be simplified, and the query result corresponding to each index can be queried accurately; and simultaneously displaying the query results of all indexes, and facilitating the comparison of the query results among the indexes by the user. The question is rewritten based on semantics as necessary to generate a more accurate query statement. And determining a display mode based on the index, displaying the change rate of the data, and facilitating the user to check the query result.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for obtaining answers to multi-index questions, comprising:
acquiring question information input by a user, wherein the question information comprises a plurality of indexes with the same attribute;
performing word segmentation processing on the problem information based on a multi-mode model, determining a word segmentation result, and extracting a plurality of indexes in the problem information, wherein the multi-mode model comprises at least two items of a word model, a character model, a pinyin model and a character pattern model;
establishing a dependency relationship between words according to the word segmentation result, and converting the problem information into a query statement in a machine language form corresponding to each index according to the dependency relationship;
and querying a corresponding database according to the query statement, determining a query result corresponding to each index, and displaying the query results of all the indexes in the same coordinate system.
2. The method of claim 1, wherein the tokenizing the question information based on a multi-modal model comprises:
determining an initial word segmentation result through a preset word segmentation model, and determining a first semantic meaning of the problem information by taking a word as a basic unit;
determining all characters of the question information, and determining a second semantic meaning of the question information by taking the characters as basic units;
determining pinyin corresponding to each character, determining pinyin vectors of each pinyin, determining first character vectors corresponding to the pinyin vectors through a convolutional neural network, and further determining third semantics of the problem information according to the first character vectors by taking the characters as basic units;
generating a corresponding font image for each character, converting the font image into a corresponding second character vector, and determining a fourth semantic meaning of the problem information according to the second character vector by taking the character as a basic unit;
comprehensively determining semantic information of the problem information according to the first semantic, the second semantic, the third semantic and the fourth semantic, performing word segmentation processing on the problem information according to the semantic information, determining a final word segmentation result, and extracting a plurality of indexes with the same attribute in the problem information.
3. The method of claim 1, further comprising, after the tokenizing the question information based on the multi-modal model:
and performing semantic understanding processing on the problem information according to the word segmentation result, judging whether the problem information needs to be rewritten according to the semantic understanding processing result, and correcting the problem information when rewriting is needed.
4. The method of claim 1, wherein the tokenizing the question information based on a multi-modal model comprises:
and performing word segmentation processing on the problem information based on a multi-mode model, and performing part-of-speech tagging on the segmented words based on a preset bidirectional long and short memory recurrent neural network model.
5. The method of claim 1, wherein displaying the query results of all the metrics in the same coordinate system comprises:
determining parameters required to be displayed by each index in the problem information, and determining a display mode corresponding to the parameters, wherein the display mode comprises one or more of a curve graph, a bar graph, a pie graph and a table;
and displaying the parameters corresponding to each index in the display mode.
6. An apparatus for obtaining answers to a multi-index question, comprising:
the system comprises a problem acquisition module, a problem analysis module and a problem analysis module, wherein the problem acquisition module is used for acquiring problem information input by a user, and the problem information comprises a plurality of indexes with the same attribute;
the word segmentation module is used for performing word segmentation processing on the problem information based on a multi-mode model, determining a word segmentation result and extracting a plurality of indexes in the problem information, wherein the multi-mode model comprises at least two items of a word model, a character model, a pinyin model and a character pattern model;
the processing module is used for establishing the dependency relationship between words according to the word segmentation result and converting the problem information into the query statement in the form of the machine language corresponding to each index according to the dependency relationship;
and the query display module is used for querying the corresponding database according to the query statement, determining the query result corresponding to each index, and displaying the query results of all the indexes in the same coordinate system.
7. The apparatus of claim 6, wherein the word segmentation module comprises:
the word processing unit is used for determining an initial word segmentation result through a preset word segmentation model and determining a first semantic meaning of the problem information by taking a word as a basic unit;
the character processing unit is used for determining all characters of the question information and determining a second semantic meaning of the question information by taking the characters as a basic unit;
the pinyin processing unit is used for determining pinyin corresponding to each character, determining a pinyin vector of each pinyin, determining a first character vector corresponding to the pinyin vector through a convolutional neural network, and further determining a third semantic meaning of the problem information according to the first character vector by taking the character as a basic unit;
the font processing unit is used for generating a corresponding font picture for each character, converting the font picture into a corresponding second character vector, and further determining a fourth semantic meaning of the problem information according to the second character vector by taking the character as a basic unit;
and the multi-mode word segmentation unit is used for comprehensively determining semantic information of the problem information according to the first semantic, the second semantic, the third semantic and the fourth semantic, performing word segmentation processing on the problem information according to the semantic information, determining a final word segmentation result, and extracting a plurality of indexes with the same attribute in the problem information.
8. The apparatus of claim 6, further comprising a rewrite module;
after the word segmentation module carries out word segmentation processing on the problem information based on a multi-mode model, the rewriting module is used for carrying out semantic understanding processing on the problem information according to the word segmentation result, judging whether the problem information needs to be rewritten according to the semantic understanding processing result, and correcting the problem information when rewriting is needed.
9. The apparatus of claim 6, wherein the word segmentation module is configured to:
and performing word segmentation processing on the problem information based on a multi-mode model, and performing part-of-speech tagging on the segmented words based on a preset bidirectional long and short memory recurrent neural network model.
10. The apparatus of claim 6, wherein the query display module is configured to:
determining parameters required to be displayed by each index in the problem information, and determining a display mode corresponding to the parameters, wherein the display mode comprises one or more of a curve graph, a bar graph, a pie graph and a table;
and displaying the parameters corresponding to each index in the display mode.
CN201911106796.8A 2019-11-13 2019-11-13 Method and device for obtaining multi-index question answers Pending CN110851484A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911106796.8A CN110851484A (en) 2019-11-13 2019-11-13 Method and device for obtaining multi-index question answers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911106796.8A CN110851484A (en) 2019-11-13 2019-11-13 Method and device for obtaining multi-index question answers

Publications (1)

Publication Number Publication Date
CN110851484A true CN110851484A (en) 2020-02-28

Family

ID=69600777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911106796.8A Pending CN110851484A (en) 2019-11-13 2019-11-13 Method and device for obtaining multi-index question answers

Country Status (1)

Country Link
CN (1) CN110851484A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009003328A1 (en) * 2007-06-29 2009-01-08 Zte Corporation Data query system and method
CN104252533A (en) * 2014-09-12 2014-12-31 百度在线网络技术(北京)有限公司 Search method and search device
CN104503998A (en) * 2014-12-05 2015-04-08 百度在线网络技术(北京)有限公司 Type identifying method and device aiming at query sentence of user
CN106250364A (en) * 2016-07-20 2016-12-21 科大讯飞股份有限公司 A kind of text modification method and device
CN107992543A (en) * 2017-11-27 2018-05-04 上海智臻智能网络科技股份有限公司 Question and answer exchange method and device, computer equipment and computer-readable recording medium
CN109033244A (en) * 2018-07-05 2018-12-18 百度在线网络技术(北京)有限公司 Search result ordering method and device
CN109522397A (en) * 2018-11-15 2019-03-26 平安科技(深圳)有限公司 Information processing method and device based on semanteme parsing
CN109863554A (en) * 2016-10-27 2019-06-07 香港中文大学 Acoustics font model and acoustics font phonemic model for area of computer aided pronunciation training and speech processes
CN110188163A (en) * 2019-04-13 2019-08-30 上海策友信息科技有限公司 Data intelligence processing system based on natural language
CN110334357A (en) * 2019-07-18 2019-10-15 北京香侬慧语科技有限责任公司 A kind of method, apparatus, storage medium and electronic equipment for naming Entity recognition
CN110348025A (en) * 2019-07-18 2019-10-18 北京香侬慧语科技有限责任公司 A kind of interpretation method based on font, device, storage medium and electronic equipment
CN110427467A (en) * 2019-06-26 2019-11-08 深圳追一科技有限公司 Question and answer processing method, device, computer equipment and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009003328A1 (en) * 2007-06-29 2009-01-08 Zte Corporation Data query system and method
CN104252533A (en) * 2014-09-12 2014-12-31 百度在线网络技术(北京)有限公司 Search method and search device
CN104503998A (en) * 2014-12-05 2015-04-08 百度在线网络技术(北京)有限公司 Type identifying method and device aiming at query sentence of user
CN106250364A (en) * 2016-07-20 2016-12-21 科大讯飞股份有限公司 A kind of text modification method and device
CN109863554A (en) * 2016-10-27 2019-06-07 香港中文大学 Acoustics font model and acoustics font phonemic model for area of computer aided pronunciation training and speech processes
CN107992543A (en) * 2017-11-27 2018-05-04 上海智臻智能网络科技股份有限公司 Question and answer exchange method and device, computer equipment and computer-readable recording medium
CN109033244A (en) * 2018-07-05 2018-12-18 百度在线网络技术(北京)有限公司 Search result ordering method and device
CN109522397A (en) * 2018-11-15 2019-03-26 平安科技(深圳)有限公司 Information processing method and device based on semanteme parsing
CN110188163A (en) * 2019-04-13 2019-08-30 上海策友信息科技有限公司 Data intelligence processing system based on natural language
CN110427467A (en) * 2019-06-26 2019-11-08 深圳追一科技有限公司 Question and answer processing method, device, computer equipment and storage medium
CN110334357A (en) * 2019-07-18 2019-10-15 北京香侬慧语科技有限责任公司 A kind of method, apparatus, storage medium and electronic equipment for naming Entity recognition
CN110348025A (en) * 2019-07-18 2019-10-18 北京香侬慧语科技有限责任公司 A kind of interpretation method based on font, device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN110502621B (en) Question answering method, question answering device, computer equipment and storage medium
CN109684448B (en) Intelligent question and answer method
CN102955848B (en) A kind of three-dimensional model searching system based on semanteme and method
CN104216913B (en) Question answering method, system and computer-readable medium
US7680646B2 (en) Retrieval method for translation memories containing highly structured documents
CN107180045B (en) Method for extracting geographic entity relation contained in internet text
US8140323B2 (en) Method and system for extracting information from unstructured text using symbolic machine learning
CN110188168A (en) Semantic relation recognition methods and device
US7428487B2 (en) Semi-automatic construction method for knowledge base of encyclopedia question answering system
US9183274B1 (en) System, methods, and data structure for representing object and properties associations
US20180052823A1 (en) Hybrid Classifier for Assigning Natural Language Processing (NLP) Inputs to Domains in Real-Time
CN102663129A (en) Medical field deep question and answer method and medical retrieval system
KR20080021017A (en) Comparing text based documents
CN106776711A (en) A kind of Chinese medical knowledge mapping construction method based on deep learning
CN107748784B (en) Method for realizing structured data search through natural language
JP2004110161A (en) Text sentence comparing device
CN101539907A (en) Part-of-speech tagging model training device and part-of-speech tagging system and method thereof
CN106934069A (en) Data retrieval method and system
CN108319734A (en) A kind of product feature structure tree method for auto constructing based on linear combiner
TW201403354A (en) System and method using data reduction approach and nonlinear algorithm to construct Chinese readability model
CN112989811B (en) History book reading auxiliary system based on BiLSTM-CRF and control method thereof
CN111104503A (en) Construction engineering quality acceptance standard question-answering system and construction method thereof
KR102251554B1 (en) Method for generating educational foreign language text by adjusting text difficulty
EP3901875A1 (en) Topic modelling of short medical inquiries
CN112559711A (en) Synonymous text prompting method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200228

RJ01 Rejection of invention patent application after publication