WO2021135439A1 - Speech semantics-based information search method and related device - Google Patents

Speech semantics-based information search method and related device Download PDF

Info

Publication number
WO2021135439A1
WO2021135439A1 PCT/CN2020/117387 CN2020117387W WO2021135439A1 WO 2021135439 A1 WO2021135439 A1 WO 2021135439A1 CN 2020117387 W CN2020117387 W CN 2020117387W WO 2021135439 A1 WO2021135439 A1 WO 2021135439A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity
retrieval
information
search
query sentence
Prior art date
Application number
PCT/CN2020/117387
Other languages
French (fr)
Chinese (zh)
Inventor
胡逸天
李琪
孟令成
魏俊勇
游志刚
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021135439A1 publication Critical patent/WO2021135439A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results

Definitions

  • This application relates to artificial intelligence, and in particular to an information retrieval method, device, computer equipment and storage medium based on speech semantics.
  • Intelligent question answering involves semantic analysis and speech recognition in the field of artificial intelligence.
  • a computer obtains a user's query instructions for something, analyzes the query instructions and retrieves the corresponding answer information for display.
  • the user's query content and expression methods are diverse and difficult to restrict. Therefore, accurately understanding the user's query intention and accurately and quickly retrieve the answer information is the key to realize the intelligent question and answer.
  • the traditional intelligent question answering technology usually adopts keyword capture, which is to search based on the keywords in the user's query sentence.
  • keyword capture is to search based on the keywords in the user's query sentence.
  • the inventor realizes that it is difficult to capture the complete question entered by the user only by keywords, and it is also difficult to retrieve answer information that meets the user's intention, and the accuracy of information retrieval is low.
  • the purpose of the embodiments of the present application is to propose an information retrieval method, device, computer equipment, and storage medium based on speech semantics, so as to solve the problem of low accuracy of information retrieval.
  • the embodiments of the present application provide an information retrieval method based on speech semantics, which adopts the following technical solutions:
  • Parse the user query sentence replace the instance entity in the user query sentence with a conceptual entity to obtain a template query sentence;
  • the conceptual entity is the entity type to which the instance entity belongs;
  • Information retrieval is performed on the database according to the retrieval tree, and the retrieved answer information is displayed.
  • an embodiment of the present application also provides an information retrieval device based on speech semantics, including:
  • the sentence acquisition module is used to acquire the input user query sentence
  • the entity replacement module is used to parse the user query statement, replace the instance entity in the user query statement with a conceptual entity to obtain a template query statement;
  • the conceptual entity is the entity type to which the instance entity belongs;
  • a similarity calculation module for calculating the similarity between the template query sentence and each inventory query sentence in the question corpus
  • the sentence determination module is used to determine the inventory query sentence matching the template query sentence and the retrieval logic formula corresponding to the inventory query sentence according to the calculated similarity;
  • the logical update module is used to update the retrieval logical formula according to the instance entity
  • the search tree generation module is used to generate the search tree based on the updated search logic formula
  • the information retrieval module is used to perform information retrieval on the database according to the retrieval tree and display the retrieved answer information.
  • an embodiment of the present application further provides a computer device, including a memory and a processor, the memory stores computer-readable instructions, and the processor implements the following steps when executing the computer-readable instructions:
  • Parse the user query sentence replace the instance entity in the user query sentence with a conceptual entity to obtain a template query sentence;
  • the conceptual entity is the entity type to which the instance entity belongs;
  • Information retrieval is performed on the database according to the retrieval tree, and the retrieved answer information is displayed.
  • embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions implement the following steps when executed by a processor:
  • Parse the user query sentence replace the instance entity in the user query sentence with a conceptual entity to obtain a template query sentence;
  • the conceptual entity is the entity type to which the instance entity belongs;
  • Information retrieval is performed on the database according to the retrieval tree, and the retrieved answer information is displayed.
  • the embodiments of the present application mainly have the following beneficial effects: firstly, the instance entities in the obtained user query statements are replaced to obtain the template query statement, and the template query statement personalizes and removes the user query statement, and then Calculate the similarity between the template query statement and each inventory query statement in the corpus, and determine the inventory query statement matching the user query statement and its retrieval logic according to the similarity to improve the processing ability of various forms of user query statement and ensure information Retrieval accuracy and usability; generate a retrieval tree based on the retrieval logic.
  • the retrieval tree indicates how to retrieve information from multiple databases. Retrieval based on the retrieval tree can accurately retrieve the information targeted by the user's query statement from the database to further ensure Improve the accuracy of information retrieval.
  • Figure 1 is an exemplary system architecture diagram to which the present application can be applied;
  • Fig. 2 is a flowchart of an embodiment of an information retrieval method based on speech semantics according to the present application
  • FIG. 3 is a schematic diagram of a single-media search tree of a single triplet in an embodiment
  • Figure 4 is a schematic diagram of a multi-triple multi-media search tree in an embodiment
  • FIG. 5 is a flowchart of a specific implementation of step S207 in FIG. 2;
  • Fig. 6 is a schematic diagram showing answer information in a bar graph in an embodiment
  • FIG. 7 is a schematic diagram showing answer information in a line chart in an embodiment
  • Fig. 8 is a schematic structural diagram of an embodiment of an information retrieval device based on speech semantics according to the present application.
  • Fig. 9 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on.
  • Various communication client applications such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, and social platform software, may be installed on the terminal devices 101, 102, and 103.
  • the terminal devices 101, 102, and 103 may be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image experts compress standard audio layer 4) players, laptop portable computers and desktop computers, etc.
  • MP3 players Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3
  • MP4 Moving Picture Experts Group Audio Layer IV, dynamic image experts compress standard audio layer 4
  • laptop portable computers and desktop computers etc.
  • the server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the terminal devices 101, 102, and 103.
  • the information retrieval method based on speech semantics provided by the embodiments of the present application is generally executed by a server, and accordingly, the information retrieval device based on speech semantics is generally set in the server.
  • terminal devices, networks, and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks, and servers according to implementation needs.
  • FIG. 2 there is shown a flowchart of an embodiment of an information retrieval method based on speech semantics according to the present application.
  • the information retrieval method based on speech semantics includes the following steps:
  • Step 201 Obtain the input user query sentence.
  • the electronic device (such as the server shown in FIG. 1) on which the speech semantic-based information retrieval method runs can communicate with the terminal through a wired connection or a wireless connection.
  • the above-mentioned wireless connection methods can include, but are not limited to, 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods currently known or developed in the future .
  • the user query statement may be a query statement input by the user.
  • the user inputs the user query statement in text form on the information retrieval page, and the terminal displaying the information retrieval page sends the user query statement to the server.
  • the user can also ask questions through voice query, and the input voice is converted into a user query sentence in text form through voice recognition.
  • Users can perform voice queries through input methods that support voice input; they can also call the application program interface provided by a third party from the information retrieval page to convert the voice, or the terminal can send the voice to the server, and the server performs the voice-to-text conversion .
  • Step 202 Parse the user query statement, replace the instance entity in the user query statement with a conceptual entity to obtain a template query statement; the conceptual entity is the entity type to which the instance entity belongs.
  • the instance entity may be a named entity in the user query statement; the entity type may be the category attribute of the instance entity.
  • the server parses the user query sentence to identify the instance entity in the user query sentence, and determines the entity type of the instance entity through semantic recognition to determine the conceptual entity corresponding to the instance entity.
  • the intent of the user query sentence is to instruct the server to retrieve information related to the instance entity.
  • Instance entities can be named entities in user query sentences, including person names, place names, organization names, numbers, dates, currencies, addresses, proper nouns, etc.
  • the server replaces the instance entity in the user query statement with the conceptual entity to obtain the template query statement; at the same time, the server retains the replaced instance entity so that the new retrieval logic can be assembled in subsequent operations.
  • the user query sentence is "When was M established?"
  • the server identified the instance entity "M”, assuming that "M” is the abbreviation of a company, the company belongs to a certain organization, and the conceptual entity corresponding to "M” As the “institution”. Then replace "M” with “institution” to get the template query sentence "When was ⁇ institution> established?", while retaining the replaced instance entity "M”.
  • the step of parsing the user query sentence, replacing the instance entity in the user query sentence with a conceptual entity, and obtaining the template query sentence specifically includes: identifying the instance entity in the user query sentence, and determining the instance entity through semantic recognition Entity type to obtain the conceptual entity representing the entity type; query the standard entity corresponding to the instance entity from the standard entity list; replace the instance entity in the user query statement with the conceptual entity to obtain the template query statement, and combine the instance entity with the standard entity Associated storage.
  • the server parses the user query sentence, recognizes the named entity in the user query sentence through Named Entity Recognition (NER, also known as proper name recognition), uses the recognized named entity as an instance entity, and passes Semantic recognition determines the entity type to which the instance entity belongs to determine the conceptual entity that represents the entity type.
  • NER Named Entity Recognition
  • Semantic recognition determines the entity type to which the instance entity belongs to determine the conceptual entity that represents the entity type.
  • the instance entity in the user query sentence may be abbreviated or irregular, and the information stored in the database exists in a standard description way.
  • the standard entities are stored in the standard entity list.
  • the server obtains the pre-established standard entity list, and searches the standard entity list for the standard entity corresponding to the instance entity through fuzzy matching.
  • the server replaces the instance entity in the user query statement with the conceptual entity to obtain the template query statement, and at the same time associates the standard entity with the instance entity and stores it in the entity association table.
  • the entity association table is used to store instance entities and corresponding standard entities in user query statements.
  • the instance entity in the user query sentence is "M", and "M” is the abbreviation, and the full name “M Co., Ltd.” is stored in the database; "M Co., Ltd.” is the standard entity corresponding to "M” .
  • the user query sentence becomes "When was ⁇ organization> established?", and the instance entity "M” is associated with the standard entity "M Co., Ltd.” for subsequent assembly of new search logic.
  • the instance entity in the user query sentence is identified and the entity type of the instance entity and the conceptual entity representing the entity type are determined; the standard entity corresponding to the instance entity is queried and the instance entity in the user query sentence is replaced with the conceptual entity , Change the user query statement from diversification to standardization, reduce the personalized information in the user query statement, facilitate subsequent query of inventory query statements through similarity, and ensure the accuracy of information retrieval; store instance entities and standard entities associatively follows-up assembling a new logical search formula.
  • Step 203 Calculate the similarity between the template query sentence and each inventory query sentence in the question corpus.
  • the inventory query sentence can be a sentence stored in the question corpus;
  • the retrieval logic formula is another embodiment of the inventory query sentence, which is used to construct a retrieval tree and characterize the retrieval logic.
  • the inventory query statement corresponds to the retrieval logic formula, and multiple inventory query statements can correspond to the same retrieval logic formula.
  • the server accesses the question corpus, and converts each inventory query sentence and template query sentence in the question corpus into a sentence vector. Through the preset similarity formula, the similarity between the sentence vector of the template query sentence and the sentence vector of each inventory query sentence is calculated.
  • the calculation of similarity can use methods such as cosine similarity, edit distance, Jaccard coefficient, TFIDF coefficient (adding inverse document frequency IDF on the basis of word frequency TF), where the cosine similarity is as follows Formula (1) is calculated:
  • QuestionA can be the sentence vector of the template query sentence
  • QuestionB can be the sentence vector of the inventory query sentence.
  • Step 204 Determine the inventory query sentence matching the template query sentence and the retrieval logic formula corresponding to the inventory query sentence according to the calculated similarity.
  • the server compares the calculated similarity with a preset similarity threshold, and selects the inventory query sentence corresponding to the maximum similarity from the similarity greater than the similarity threshold as the inventory query matching the template query sentence Statement.
  • the server queries the query sentence corpus for the retrieval logic formula corresponding to the inventory query statement, and establishes the mapping relationship between the user query statement-the template query statement-the inventory query statement-the retrieval logic formula.
  • the user query sentence is "When was M established?", and the template query sentence " ⁇ organization> was established?" is obtained after the instance entity is replaced.
  • Step 205 Update the search logic formula according to the instance entity.
  • the standard entity is the standard entity retrieved in the previous search.
  • the server needs to update the retrieval logic based on the instance entity in the user query statement during this retrieval.
  • the step of updating the retrieval logic formula according to the instance entity specifically includes: obtaining a standard entity stored in association with the instance entity; and replacing the standard entity in the retrieval logic formula with the retrieved standard entity.
  • the position of "N" is a variable
  • the other parts of the retrieval logic formula are immutable
  • the "N" at the value is variable.
  • the previous search may be the "date of establishment of N”, so the search logic formula is "N” after the search. This time the search is for "M”, so replace “N” with "M Co., Ltd.”, otherwise
  • the generated search tree is for "N”.
  • the server obtains the standard entity associated with the instance entity in the user query statement from the entity association table, and replaces the standard entity in the retrieval logic formula with the obtained standard entity.
  • the standard entity in the search logic formula is replaced with the standard entity associated with the instance entity in the user query sentence.
  • the replaced search logic formula is aimed at this search, ensuring that the data can be accurately obtained from the database. Relevant information for this search.
  • Step 206 Generate a search tree based on the updated search logic formula.
  • the retrieval tree may be a storage structure based on a binary tree.
  • the search logic formula indicates the last information to be searched in each search.
  • the search tree When constructing the search tree based on the updated search logic formula, take the last information to be searched as the root node.
  • Different search formulas can correspond to different search types, and different search types correspond to different search tree structures.
  • the server fills the search tree structure according to the search logic formula to generate a search tree.
  • the retrieval tree can be a binary tree, each internal node in the branches of the binary tree is the information to be retrieved, the left and right branches of the node are the retrieval conditions, and the root node of the binary tree is the information that needs to be retrieved finally.
  • the step of generating a search tree based on the updated search logic specifically includes: identifying the search type of the search logic; when the search type is a single triple single medium search, generating a single triple single Medium retrieval tree; when the retrieval type is multi-triple multi-media retrieval, a multi-triple multi-media retrieval tree is generated.
  • the retrieval type can be the retrieval type, which is determined by the attributes of the retrieved object and the storage medium accessed during retrieval; the storage medium can be a database storing information.
  • retrieval formulas can correspond to different retrieval types, and retrieval types include single-triple single-media retrieval and multi-triple multi-media retrieval.
  • a single-triple single-media retrieval tree is generated. For example, when retrieving a single attribute value of a single entity, the logical form of the retrieval tree is
  • E represents the standard entity
  • attr represents the attribute of the standard entity, here is the attribute A
  • attr_value represents the attribute value of the attribute A.
  • the search tree structure includes the root node "attribute value”, the left leaf node “entity E” and the right leaf node "attribute A”. This structure is only searched once in a single storage medium.
  • the logical form of the search tree is:
  • the corresponding search tree structure includes the root node "attribute value”, the left leaf node "M Co., Ltd.” and the right leaf node "registered date”.
  • the generated search tree is shown in Figure 3.
  • a multi-triple multi-media retrieval tree is generated.
  • the single-triple single-medium retrieval tree and the multi-triplet multi-medium retrieval tree are both binary trees, but the depth and shape of the two are not the same. If the attribute value of an entity that has a certain relationship between the retrieval and the instance entity, the logical form of the retrieval tree is
  • HE is the standard entity
  • HE is the head entity head_entity in the search tree
  • the other entity is the tail entity tail_entity in the search tree
  • the attribute A of attribute A, attr_value represents the attribute value of attribute A.
  • the search tree structure contains the root node "attribute value”, the left subtree (left leaf node “entity HE”, right leaf node “relation R”) and right leaf node “attribute A”.
  • the structure is in two storage media Retrieve once within each.
  • the logical form of the search tree is
  • the corresponding search tree structure includes the root node "attribute value”, the left subtree (the left leaf node “M Co., Ltd.”, the right leaf node “investment relationship”) and the right leaf node "registered date", the generated search
  • the tree is shown in Figure 4.
  • a search tree corresponding to the search type of the search logic is generated, and the search tree indicates how to retrieve information from the database, ensuring that the information related to the user query sentence can be accurately obtained from the database.
  • Step 207 Perform information retrieval on the database according to the retrieval tree, and display the retrieved answer information.
  • the node of the search tree is the information to be searched
  • the left and right branches of each node are the search conditions required when searching for the node
  • the root node of the binary tree is the information that needs to be searched finally.
  • the server performs depth-first traversal of the search tree to verify the feasibility of the search tree and obtain a search strategy.
  • the server verifies whether the node satisfies the grammar through depth-first traversal, and "Zhang San” is a name that does not match the "Registration Date", that is, "Zhang San” does not have the feasibility to retrieve the "Registration Date” and returns an error message.
  • depth-first traversal can also determine the search steps in the database, that is, first search the left and right branches of each node to obtain the relevant information of each node, and finally retrieve the relevant information of the root node.
  • the determined search step is the search strategy.
  • the server searches in each database according to the search strategy. After the answer information is retrieved, the answer information is returned to the terminal for display.
  • the depth-first (Depth-First-Search) is to reach the leaf nodes in the search tree (that is, nodes that do not contain any branches).
  • the depth-first search When performing a depth-first search on the search tree, first search a single chain completely. When there is no branch along a chain, return to the previous node to continue to explore other chains in the search tree. When there is no branch in the entire search tree, return to the previous node. When there are other chains to choose from, the depth-first search ends.
  • the method further includes: setting the template query sentence as the inventory query sentence to update the question sentence corpus;
  • the added inventory query statement is related to the updated retrieval logic.
  • the template query sentence obtained by replacing the user query sentence is added to the question corpus to obtain a new inventory query sentence; the search logic is updated according to the standard entity and the newly added inventory.
  • the query statements are set to be related to each other.
  • the newly added inventory query sentences can participate in future searches to continuously enrich the question corpus, improve the system's robustness and the processing ability to deal with different questions.
  • the template query sentence is added to the question corpus and matched with the retrieval logic, which enriches the inventory query sentence in the question corpus and improves the system's processing ability for various user query sentences.
  • the instance entity in the obtained user query sentence is first replaced to obtain a template query sentence.
  • the template query sentence personalizes the user query sentence and then calculates the difference between the template query sentence and each inventory query sentence in the corpus. Similarity, according to the similarity, the inventory query sentence matching the user query sentence and its retrieval logic are determined to improve the processing ability of various forms of user query sentences to ensure the accuracy and usability of information retrieval; generate according to the retrieval logic formula Search tree, the search tree indicates how to retrieve information from multiple databases. Retrieval based on the search tree can accurately retrieve the information targeted by the user's query statement from the database, which further ensures the accuracy of information retrieval.
  • the foregoing step 207 may include:
  • Step 2071 Perform a depth-first traversal of the search tree to determine the search strategy corresponding to the search tree, and determine the information type based on the search strategy.
  • the information type can be the type of information retrieved for standard entities, including retrieval of single entity single attribute, entity relationship, single entity multi-attribute, multi-entity single attribute, attribute change trend (including single entity multi-attribute change trend and multi-entity single attribute Attribute change trend) etc.
  • the retrieval frequency of standard entities is obtained through big data or historical data, and the standard entities are inverted index according to the retrieval frequency, so that the required information can be retrieved as soon as possible.
  • Entity attributes that are not used as search conditions are stored in the traditional relational database PostgreSQL to reduce the load of the ElasticSearch database.
  • For ⁇ head entity-relationship-tail entity> type triple data it is stored in the graph database Neo4j in NoSQL (Not Only SQL, non-relational database).
  • the server determines the retrieval strategy through depth-first traversal, and the retrieval strategy instructs how to obtain information from the database.
  • the search strategy is: access the ElasticSearch database, and retrieve the registration date of M Co., Ltd. from the ElasticSearch database.
  • the search strategy is: search for tail entities that have an investment relationship with M Co., Ltd. from the Neo4j database, and then search for the registration date of the tail entity in the ElasticSearch database, and finally splice it in different ways based on M Co., Ltd. Answer information retrieved in the database.
  • the type of information can be determined by the retrieval strategy. For example, when the retrieval strategy is to access the ElasticSearch database and retrieve the registration date of M Co., Ltd. from the ElasticSearch database, only one attribute "Registration Date” of the entity "M Co., Ltd.” needs to be retrieved, and the information type is a single entity. Attributes. When you need to retrieve the trade volume of six companies in a certain industry in 2019, you need to retrieve the same attribute "trade volume" of the six standard entities, and the information type is multi-entity single attribute.
  • Step 2072 Perform information retrieval on the database according to the retrieval strategy to obtain answer information.
  • the server accesses the database according to the determined search strategy, extracts information from the database, and obtains answer information.
  • Step 2073 Display the answer information according to the information type.
  • the server determines the display mode of the answer information according to the information type, and the display mode includes text, diagrams, etc.
  • the server sends the answer information to the terminal, and the terminal displays the answer information according to the determined display mode.
  • the step of displaying the answer information according to the information type specifically includes: when the information type is a single entity with a single attribute or entity relationship, displaying the answer information in text; when the information type is a single entity with multiple attributes or a single entity with multiple entities In the case of attributes, the answer information is displayed in a histogram; when the information type is an attribute change trend, the answer information is displayed in a line chart.
  • the answer information is displayed in descriptive text.
  • the format of the descriptive text is: ⁇ attribute name> of ⁇ entity> is ⁇ attribute value>, then: the registration date of M Co., Ltd. is xxxx year xx month xx day, and the answer information dimension is 1 *2.
  • the answer information is displayed in a histogram.
  • the answer information displayed when retrieving the trade volume of six companies in a certain industry is shown in Figure 6.
  • the answer information can also include the date of data and the name of each company.
  • the answer dimension of the answer information displayed in the histogram is 1*N (N>2) or N*2, where N is a positive integer.
  • the information type is the attribute change trend
  • the answer information is displayed in a line graph. For example, when searching for the change trend of the sales of M Co., Ltd. in each quarter in 2019, the displayed answer information is as shown in Figure 7, and the answer information may also include the data date.
  • the answer information is displayed in text, graphics, etc. according to the type of information retrieved, which improves the intelligence of answer information display.
  • Step 2074 upload the answer information to the blockchain.
  • the corresponding summary information is obtained based on the answer information.
  • the summary information is obtained by hashing the answer information, for example, obtained by using the sha256s algorithm.
  • Uploading summary information to the blockchain can ensure its security and fairness and transparency to users.
  • the user equipment can download the summary information from the blockchain to verify whether the answer information has been tampered with.
  • the search tree is traversed depth-first to obtain the search strategy and the information type is determined based on the search strategy.
  • the server can obtain the required information from the database faster and more accurately, and can display it intelligently according to the information type. Answer information and upload the answer information to the blockchain to ensure the security, fairness and transparency of the answer information.
  • the information retrieval method based on speech semantics in this application involves neural networks, natural language processing, speech processing, and knowledge representation and reasoning in the field of artificial intelligence; in addition, it may also involve smart life in the field of smart cities.
  • the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a computer-readable storage medium.
  • the computer-readable instructions When executed, they may include the processes of the above-mentioned method embodiments.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
  • this application provides an embodiment of an information retrieval device based on speech semantics.
  • the device embodiment corresponds to the method embodiment shown in FIG. 2.
  • the device can be specifically applied to various electronic devices.
  • the information retrieval device 300 based on speech semantics in this embodiment includes: a sentence acquisition module 301, an entity replacement module 302, a similarity calculation module 303, a sentence determination module 304, a logical expression update module 305, and a search
  • the tree generation module 306 and the information retrieval module 307 wherein:
  • the sentence acquisition module 301 is used to acquire the input user query sentence.
  • the entity replacement module 302 is used to parse the user query statement, replace the instance entity in the user query statement with a conceptual entity to obtain a template query statement; the conceptual entity is the entity type to which the instance entity belongs.
  • the similarity calculation module 303 is used to calculate the similarity between the template query sentence and each inventory query sentence in the question corpus.
  • the sentence determination module 304 is configured to determine the inventory query sentence matching the template query sentence and the retrieval logic formula corresponding to the inventory query sentence according to the calculated similarity.
  • the logical update module 305 is used to update the retrieval logical formula according to the instance entity.
  • the retrieval tree generation module 306 is configured to generate a retrieval tree based on the updated retrieval logic formula.
  • the information retrieval module 307 is used to perform information retrieval on the database according to the retrieval tree and display the retrieved answer information.
  • the instance entity in the obtained user query sentence is first replaced to obtain a template query sentence.
  • the template query sentence personalizes the user query sentence and then calculates the difference between the template query sentence and each inventory query sentence in the corpus. Similarity, according to the similarity, the inventory query sentence matching the user query sentence and its retrieval logic are determined to improve the processing ability of various forms of user query sentences to ensure the accuracy and usability of information retrieval; generate according to the retrieval logic formula Search tree, the search tree indicates how to retrieve information from multiple databases. Retrieval based on the search tree can accurately retrieve the information targeted by the user's query statement from the database, which further ensures the accuracy of information retrieval.
  • the above-mentioned entity replacement module 302 includes: a sentence parsing sub-module, a standard query sub-module, and an entity replacement sub-module, wherein:
  • the sentence parsing sub-module is used to identify the instance entity in the user's query sentence, and determine the entity type of the instance entity through semantic recognition to obtain the conceptual entity representing the entity type.
  • the standard query sub-module is used to query the standard entity corresponding to the instance entity from the standard entity list.
  • the entity replacement sub-module is used to replace the instance entity in the user query statement with the conceptual entity to obtain the template query statement, and store the instance entity in association with the standard entity.
  • the instance entity in the user query sentence is identified and the entity type of the instance entity and the conceptual entity representing the entity type are determined; the standard entity corresponding to the instance entity is queried and the instance entity in the user query sentence is replaced with the conceptual entity , Change the user query statement from diversification to standardization, reduce the personalized information in the user query statement, facilitate subsequent query of inventory query statements through similarity, and ensure the accuracy of information retrieval; store instance entities and standard entities associatively follows-up assembling a new logical search formula.
  • the above-mentioned logical update module 305 includes: an entity acquisition sub-module and a standard replacement sub-module, wherein:
  • the entity acquisition sub-module is used to acquire the standard entity stored in association with the instance entity.
  • the standard replacement sub-module is used to replace the standard entity in the search logic formula with the obtained standard entity.
  • the standard entity in the search logic formula is replaced with the standard entity associated with the instance entity in the user query sentence.
  • the replaced search logic formula is aimed at this search, ensuring that the data can be accurately obtained from the database. Relevant information for this search.
  • the above-mentioned search tree generation module 306 includes: a type recognition sub-module and a search tree generation sub-module, wherein:
  • the type identification sub-module is used to identify the search type of the search logic.
  • the search tree generation sub-module is used to generate a single-triple single-media search tree when the search type is single-triple single-media search.
  • the search tree generation submodule is also used to generate a multi-triple multi-media search tree when the search type is a multi-triple multi-media search.
  • a search tree corresponding to the search type of the search logic is generated, and the search tree indicates how to retrieve information from the database, ensuring that the information related to the user query sentence can be accurately obtained from the database.
  • the above-mentioned information retrieval module 307 includes: a depth traversal sub-module, an information retrieval sub-module, an information display sub-module, and an information upload sub-module, among which:
  • the depth traversal sub-module is used for depth-first traversal of the search tree to determine the search strategy corresponding to the search tree, and to determine the information type based on the search strategy.
  • the information retrieval sub-module is used to retrieve information from the database according to the retrieval strategy to obtain answer information.
  • the information display sub-module is used to display the answer information according to the information type.
  • the information upload sub-module is used to upload answer information to the blockchain.
  • the search tree is traversed depth-first to obtain the search strategy and the information type is determined based on the search strategy.
  • the server can obtain the required information from the database faster and more accurately, and can display it intelligently according to the information type. Answer information and upload the answer information to the blockchain to ensure the security, fairness and transparency of the answer information.
  • the above-mentioned information display submodule includes: a text display unit, a bar chart display unit, and a line chart display unit, wherein:
  • the text display unit is used to display the answer information in text when the information type is a single entity, single attribute or entity relationship.
  • the bar graph display unit is used to display the answer information in a bar graph when the information type is a single entity with multiple attributes or multiple entities with a single attribute.
  • the line chart display unit is used to display the answer information in a line chart when the information type is an attribute change trend.
  • the answer information is displayed in text, graphics, etc. according to the type of information retrieved, which improves the intelligence of answer information display.
  • the above-mentioned speech semantic-based information retrieval apparatus 300 further includes: a sentence update module and an association module, wherein:
  • the sentence update module is used to set the template query sentence as the inventory query sentence to update the question sentence corpus.
  • the association module is used to associate the newly added inventory query sentence in the question corpus with the updated retrieval logic.
  • the template query sentence is added to the question corpus and matched with the retrieval logic, which enriches the inventory query sentence in the question corpus and improves the system's processing ability for various user query sentences.
  • FIG. 9 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are connected to each other in communication via a system bus. It should be pointed out that the figure only shows the computer device 4 with components 41-43, but it should be understood that it is not required to implement all the shown components, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • the memory 41 includes at least one type of computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium includes flash memory, hard disk, and multimedia card. , Card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), Programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or memory of the computer device 4.
  • the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk equipped on the computer device 4, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, Flash Card, etc.
  • the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device.
  • the memory 41 is generally used to store an operating system and various application software installed in the computer device 4, such as computer-readable instructions of a speech-semantic-based information retrieval method.
  • the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 42 is generally used to control the overall operation of the computer device 4.
  • the processor 42 is configured to run computer-readable instructions or process data stored in the memory 41, for example, run the computer-readable instructions of the speech semantic-based information retrieval method.
  • the network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.
  • the computer device provided in this embodiment can execute the steps of the above-mentioned speech semantic-based information retrieval method.
  • the steps of the information retrieval method based on speech semantics may be the steps in the information retrieval method based on speech semantics in each of the above embodiments.
  • the instance entity in the obtained user query sentence is first replaced to obtain a template query sentence.
  • the template query sentence personalizes the user query sentence and then calculates the difference between the template query sentence and each inventory query sentence in the corpus. Similarity, according to the similarity, the inventory query sentence matching the user query sentence and its retrieval logic are determined to improve the processing ability of various forms of user query sentences to ensure the accuracy and usability of information retrieval; generate according to the retrieval logic formula Search tree, the search tree indicates how to retrieve information from multiple databases. Retrieval based on the search tree can accurately retrieve the information targeted by the user's query statement from the database, which further ensures the accuracy of information retrieval.
  • This application also provides another implementation manner, that is, to provide a computer-readable storage medium that stores computer-readable instructions for information retrieval based on speech semantics, and the information retrieval based on speech semantics
  • the computer-readable instructions of may be executed by at least one processor, so that the at least one processor executes the steps of the above-mentioned speech semantic-based information retrieval method.
  • the instance entity in the obtained user query sentence is first replaced to obtain a template query sentence.
  • the template query sentence personalizes the user query sentence and then calculates the difference between the template query sentence and each inventory query sentence in the corpus. Similarity, according to the similarity, the inventory query sentence matching the user query sentence and its retrieval logic are determined to improve the processing ability of various forms of user query sentences to ensure the accuracy and usability of information retrieval; generate according to the retrieval logic formula Search tree, the search tree indicates how to retrieve information from multiple databases. Retrieval based on the search tree can accurately retrieve the information targeted by the user's query statement from the database, which further ensures the accuracy of information retrieval.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
  • a terminal device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided is a speech semantics-based information search method, comprising: obtaining an inputted user query statement (201); replacing an instance entity in the user query statement with a concept entity to obtain a template query statement; calculating the similarity between the template query statement and each inventory query sentence in a question corpus (203); according to the calculated similarity, determining an inventory query statement matching the template query statement and a search logic formula corresponding to the inventory query statement (204); updating the search logic according to the instance entity (205); generating a search tree on the basis of the updated search logic (206); performing information search on the database according to the search tree, and displaying searched answer information (207). The described method improves the accuracy of information search.

Description

基于语音语义的信息检索方法、及其相关设备Information retrieval method based on speech semantics and related equipment
本申请要求于2020年05月22日提交中国专利局、申请号为202010440491.7,发明名称为“基于语音语义的信息检索方法、及其相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 22, 2020, the application number is 202010440491.7, and the invention title is "Speech Semantic Information Retrieval Method and Related Equipment", the entire content of which is incorporated by reference Incorporated in this application.
技术领域Technical field
本申请涉及人工智能,尤其涉及一种基于语音语义的信息检索方法、装置、计算机设备及存储介质。This application relates to artificial intelligence, and in particular to an information retrieval method, device, computer equipment and storage medium based on speech semantics.
背景技术Background technique
随着人工智能的发展,智能问答在BI(Business Intelligence,商业智能)***中的应用越来越广泛。智能问答涉及人工智能领域中的语义解析及语音识别等,通常是由计算机获取用户针对某事物的查询指令,对查询指令进行分析并检索到对应的答案信息后进行展示。在智能问答中,用户的查询内容和表达方式多种多样,难以进行限制,因此准确理解用户的查询意图并准确快速地检索到答案信息,是实现智能问答的关键。With the development of artificial intelligence, intelligent question answering has become more and more widely used in BI (Business Intelligence) systems. Intelligent question answering involves semantic analysis and speech recognition in the field of artificial intelligence. Usually, a computer obtains a user's query instructions for something, analyzes the query instructions and retrieves the corresponding answer information for display. In the intelligent question and answer, the user's query content and expression methods are diverse and difficult to restrict. Therefore, accurately understanding the user's query intention and accurately and quickly retrieve the answer information is the key to realize the intelligent question and answer.
传统的智能问答技术为了应对用户不确定的输入,通常采用关键字捕捉,即依据用户查询语句中的关键字进行检索。然而发明人意识到,仅仅通过关键字捕捉难以理解用户输入的完整的问句,也难以检索到满足用户意图的答案信息,信息检索的准确性较低。In order to cope with the user's uncertain input, the traditional intelligent question answering technology usually adopts keyword capture, which is to search based on the keywords in the user's query sentence. However, the inventor realizes that it is difficult to capture the complete question entered by the user only by keywords, and it is also difficult to retrieve answer information that meets the user's intention, and the accuracy of information retrieval is low.
发明内容Summary of the invention
本申请实施例的目的在于提出一种基于语音语义的信息检索方法、装置、计算机设备及存储介质,以解决信息检索准确性较低的问题。The purpose of the embodiments of the present application is to propose an information retrieval method, device, computer equipment, and storage medium based on speech semantics, so as to solve the problem of low accuracy of information retrieval.
为了解决上述技术问题,本申请实施例提供一种基于语音语义的信息检索方法,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiments of the present application provide an information retrieval method based on speech semantics, which adopts the following technical solutions:
获取输入的用户查询语句;Get the input user query statement;
解析所述用户查询语句,将所述用户查询语句中的实例实体替换为概念实体,得到模板查询语句;所述概念实体为所述实例实体所属的实体类型;Parse the user query sentence, replace the instance entity in the user query sentence with a conceptual entity to obtain a template query sentence; the conceptual entity is the entity type to which the instance entity belongs;
计算所述模板查询语句与问句语料库中各库存查询语句的相似度;Calculating the similarity between the template query sentence and each inventory query sentence in the question corpus;
根据计算得到的相似度确定与所述模板查询语句匹配的库存查询语句,以及与所述库存查询语句对应的检索逻辑式;Determine, according to the calculated similarity, an inventory query sentence matching the template query sentence, and a retrieval logic formula corresponding to the inventory query sentence;
根据所述实例实体对检索逻辑式进行更新;Update the retrieval logic formula according to the instance entity;
基于更新后的检索逻辑式生成检索树;Generate a search tree based on the updated search logic;
根据所述检索树对数据库进行信息检索,并展示检索到的答案信息。Information retrieval is performed on the database according to the retrieval tree, and the retrieved answer information is displayed.
为了解决上述技术问题,本申请实施例还提供一种基于语音语义的信息检索装置,包括:In order to solve the above technical problems, an embodiment of the present application also provides an information retrieval device based on speech semantics, including:
语句获取模块,用于获取输入的用户查询语句;The sentence acquisition module is used to acquire the input user query sentence;
实体替换模块,用于解析所述用户查询语句,将所述用户查询语句中的实例实体替换为概念实体,得到模板查询语句;所述概念实体为所述实例实体所属的实体类型;The entity replacement module is used to parse the user query statement, replace the instance entity in the user query statement with a conceptual entity to obtain a template query statement; the conceptual entity is the entity type to which the instance entity belongs;
相似度计算模块,用于计算所述模板查询语句与问句语料库中各库存查询语句的相似度;A similarity calculation module for calculating the similarity between the template query sentence and each inventory query sentence in the question corpus;
语句确定模块,用于根据计算得到的相似度确定与所述模板查询语句匹配的库存查询语句,以及与所述库存查询语句对应的检索逻辑式;The sentence determination module is used to determine the inventory query sentence matching the template query sentence and the retrieval logic formula corresponding to the inventory query sentence according to the calculated similarity;
逻辑式更新模块,用于根据所述实例实体对检索逻辑式进行更新;The logical update module is used to update the retrieval logical formula according to the instance entity;
检索树生成模块,用于基于更新后的检索逻辑式生成检索树;The search tree generation module is used to generate the search tree based on the updated search logic formula;
信息检索模块,用于根据所述检索树对数据库进行信息检索,并展示检索到的答案信 息。The information retrieval module is used to perform information retrieval on the database according to the retrieval tree and display the retrieved answer information.
为了解决上述技术问题,本申请实施例还提供一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:In order to solve the above technical problems, an embodiment of the present application further provides a computer device, including a memory and a processor, the memory stores computer-readable instructions, and the processor implements the following steps when executing the computer-readable instructions:
获取输入的用户查询语句;Get the input user query statement;
解析所述用户查询语句,将所述用户查询语句中的实例实体替换为概念实体,得到模板查询语句;所述概念实体为所述实例实体所属的实体类型;Parse the user query sentence, replace the instance entity in the user query sentence with a conceptual entity to obtain a template query sentence; the conceptual entity is the entity type to which the instance entity belongs;
计算所述模板查询语句与问句语料库中各库存查询语句的相似度;Calculating the similarity between the template query sentence and each inventory query sentence in the question corpus;
根据计算得到的相似度确定与所述模板查询语句匹配的库存查询语句,以及与所述库存查询语句对应的检索逻辑式;Determine, according to the calculated similarity, an inventory query sentence matching the template query sentence, and a retrieval logic formula corresponding to the inventory query sentence;
根据所述实例实体对检索逻辑式进行更新;Update the retrieval logic formula according to the instance entity;
基于更新后的检索逻辑式生成检索树;Generate a search tree based on the updated search logic;
根据所述检索树对数据库进行信息检索,并展示检索到的答案信息。Information retrieval is performed on the database according to the retrieval tree, and the retrieved answer information is displayed.
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下步骤:In order to solve the above technical problems, embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions implement the following steps when executed by a processor:
获取输入的用户查询语句;Get the input user query statement;
解析所述用户查询语句,将所述用户查询语句中的实例实体替换为概念实体,得到模板查询语句;所述概念实体为所述实例实体所属的实体类型;Parse the user query sentence, replace the instance entity in the user query sentence with a conceptual entity to obtain a template query sentence; the conceptual entity is the entity type to which the instance entity belongs;
计算所述模板查询语句与问句语料库中各库存查询语句的相似度;Calculating the similarity between the template query sentence and each inventory query sentence in the question corpus;
根据计算得到的相似度确定与所述模板查询语句匹配的库存查询语句,以及与所述库存查询语句对应的检索逻辑式;Determine, according to the calculated similarity, an inventory query sentence matching the template query sentence, and a retrieval logic formula corresponding to the inventory query sentence;
根据所述实例实体对检索逻辑式进行更新;Update the retrieval logic formula according to the instance entity;
基于更新后的检索逻辑式生成检索树;Generate a search tree based on the updated search logic;
根据所述检索树对数据库进行信息检索,并展示检索到的答案信息。Information retrieval is performed on the database according to the retrieval tree, and the retrieved answer information is displayed.
与现有技术相比,本申请实施例主要有以下有益效果:先将获取到的用户查询语句中的实例实体进行替换,得到模板查询语句,模板查询语句对用户查询语句进行个性化去除,再计算模板查询语句与语料库中各库存查询语句的相似度,根据相似度确定与用户查询语句匹配的库存查询语句及其检索逻辑式,以提升对各种形式的用户查询语句的处理能力,保证信息检索的准确性和可用性;根据检索逻辑式生成检索树,检索树指示如何从多个数据库中检索信息,基于检索树进行检索可以准确地从数据库中检索到用户查询语句所针对的信息,进一步确保了信息检索的准确性。Compared with the prior art, the embodiments of the present application mainly have the following beneficial effects: firstly, the instance entities in the obtained user query statements are replaced to obtain the template query statement, and the template query statement personalizes and removes the user query statement, and then Calculate the similarity between the template query statement and each inventory query statement in the corpus, and determine the inventory query statement matching the user query statement and its retrieval logic according to the similarity to improve the processing ability of various forms of user query statement and ensure information Retrieval accuracy and usability; generate a retrieval tree based on the retrieval logic. The retrieval tree indicates how to retrieve information from multiple databases. Retrieval based on the retrieval tree can accurately retrieve the information targeted by the user's query statement from the database to further ensure Improve the accuracy of information retrieval.
附图说明Description of the drawings
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the solution in this application more clearly, the following will briefly introduce the drawings used in the description of the embodiments of the application. Obviously, the drawings in the following description are some embodiments of the application. Ordinary technicians can obtain other drawings based on these drawings without creative work.
图1是本申请可以应用于其中的示例性***架构图;Figure 1 is an exemplary system architecture diagram to which the present application can be applied;
图2是根据本申请的基于语音语义的信息检索方法的一个实施例的流程图;Fig. 2 is a flowchart of an embodiment of an information retrieval method based on speech semantics according to the present application;
图3是一个实施例中单一三元组单一介质检索树的示意图;FIG. 3 is a schematic diagram of a single-media search tree of a single triplet in an embodiment;
图4是一个实施例中多三元组多介质检索树的示意图;Figure 4 is a schematic diagram of a multi-triple multi-media search tree in an embodiment;
图5是图2中步骤S207的一种具体实施方式的流程图;FIG. 5 is a flowchart of a specific implementation of step S207 in FIG. 2;
图6是一个实施例中以柱状图展示答案信息的示意图;Fig. 6 is a schematic diagram showing answer information in a bar graph in an embodiment;
图7是一个实施例中以折线图展示答案信息的示意图;FIG. 7 is a schematic diagram showing answer information in a line chart in an embodiment;
图8是根据本申请的基于语音语义的信息检索装置的一个实施例的结构示意图;Fig. 8 is a schematic structural diagram of an embodiment of an information retrieval device based on speech semantics according to the present application;
图9是根据本申请的计算机设备的一个实施例的结构示意图。Fig. 9 is a schematic structural diagram of an embodiment of a computer device according to the present application.
具体实施方式Detailed ways
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meanings as commonly understood by those skilled in the technical field of the application; the terms used in the specification of the application herein are only for describing specific embodiments. The purpose is not to limit the application; the terms "including" and "having" in the specification and claims of the application and the above-mentioned description of the drawings and any variations thereof are intended to cover non-exclusive inclusions. The terms "first", "second", etc. in the specification and claims of the present application or the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。The reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。In order to enable those skilled in the art to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings.
如图1所示,***架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。The user can use the terminal devices 101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on. Various communication client applications, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, and social platform software, may be installed on the terminal devices 101, 102, and 103.
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。The terminal devices 101, 102, and 103 may be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image experts compress standard audio layer 4) players, laptop portable computers and desktop computers, etc.
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the terminal devices 101, 102, and 103.
需要说明的是,本申请实施例所提供的基于语音语义的信息检索方法一般由服务器执行,相应地,基于语音语义的信息检索装置一般设置于服务器中。It should be noted that the information retrieval method based on speech semantics provided by the embodiments of the present application is generally executed by a server, and accordingly, the information retrieval device based on speech semantics is generally set in the server.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks, and servers according to implementation needs.
继续参考图2,示出了根据本申请的基于语音语义的信息检索方法的一个实施例的流程图。所述的基于语音语义的信息检索方法,包括以下步骤:Continuing to refer to FIG. 2, there is shown a flowchart of an embodiment of an information retrieval method based on speech semantics according to the present application. The information retrieval method based on speech semantics includes the following steps:
步骤201,获取输入的用户查询语句。Step 201: Obtain the input user query sentence.
在本实施例中,基于语音语义的信息检索方法运行于其上的电子设备(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式与终端进行通信。需要指出的是,上述无线连接方式可以包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、UWB(ultra wideband)连接、以及其他现在已知或将来开发的无线连接方式。In this embodiment, the electronic device (such as the server shown in FIG. 1) on which the speech semantic-based information retrieval method runs can communicate with the terminal through a wired connection or a wireless connection. It should be pointed out that the above-mentioned wireless connection methods can include, but are not limited to, 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods currently known or developed in the future .
其中,用户查询语句可以是用户输入的查询语句。The user query statement may be a query statement input by the user.
具体地,用户在信息检索页面中以文本形式输入用户查询语句,由展示信息检索页面的终端将用户查询语句发送至服务器。用户也可以通过语音查询提问,输入的语音经过语音识别转换为文本形式的用户查询语句。用户可以通过支持语音输入的输入法进行语音查询;也可以由信息检索页面调用第三方提供的应用程序接口对语音进行转换,也可以由终端将语音发送至服务器,由服务器进行语音到文字的转换。Specifically, the user inputs the user query statement in text form on the information retrieval page, and the terminal displaying the information retrieval page sends the user query statement to the server. The user can also ask questions through voice query, and the input voice is converted into a user query sentence in text form through voice recognition. Users can perform voice queries through input methods that support voice input; they can also call the application program interface provided by a third party from the information retrieval page to convert the voice, or the terminal can send the voice to the server, and the server performs the voice-to-text conversion .
步骤202,解析用户查询语句,将用户查询语句中的实例实体替换为概念实体,得到模板查询语句;概念实体为实例实体所属的实体类型。Step 202: Parse the user query statement, replace the instance entity in the user query statement with a conceptual entity to obtain a template query statement; the conceptual entity is the entity type to which the instance entity belongs.
其中,实例实体可以是用户查询语句中的命名实体;实体类型可以是实例实体的类别属性。Among them, the instance entity may be a named entity in the user query statement; the entity type may be the category attribute of the instance entity.
具体地,服务器对用户查询语句进行解析以识别用户查询语句中的实例实体,通过语义识别确定实例实体的实体类型,以确定实例实体所对应的概念实体。用户查询语句的意图即为指示服务器检索与实例实体相关的信息。实例实体可以是用户查询语句中的命名实体,包括人名、地名、机构名、数字、日期、货币、地址、专有名词等。服务器以概念实体替换用户查询语句中的实例实体,得到模板查询语句;同时,服务器保留被替换掉的实例实体,以便后续操作中组装新的检索逻辑式。Specifically, the server parses the user query sentence to identify the instance entity in the user query sentence, and determines the entity type of the instance entity through semantic recognition to determine the conceptual entity corresponding to the instance entity. The intent of the user query sentence is to instruct the server to retrieve information related to the instance entity. Instance entities can be named entities in user query sentences, including person names, place names, organization names, numbers, dates, currencies, addresses, proper nouns, etc. The server replaces the instance entity in the user query statement with the conceptual entity to obtain the template query statement; at the same time, the server retains the replaced instance entity so that the new retrieval logic can be assembled in subsequent operations.
举例说明,用户查询语句为“M是什么时候成立的?”,服务器识别到实例实体“M”,假设“M”为一个公司的简称,公司属于某种机构,与“M”对应的概念实体为“机构”。则将“M”替换为“机构”,得到模板查询语句“<机构>是什么时候成立的?”,同时保留被替换掉的实例实体“M”。For example, the user query sentence is "When was M established?", the server identified the instance entity "M", assuming that "M" is the abbreviation of a company, the company belongs to a certain organization, and the conceptual entity corresponding to "M" As the "institution". Then replace "M" with "institution" to get the template query sentence "When was <institution> established?", while retaining the replaced instance entity "M".
在一个实施例中,解析用户查询语句,将用户查询语句中的实例实体替换为概念实体,得到模板查询语句的步骤具体包括:识别用户查询语句中的实例实体,并通过语义识别确定实例实体的实体类型以得到表示实体类型的概念实体;从标准实体列表中查询与实例实体所对应的标准实体;将用户查询语句中的实例实体替换为概念实体得到模板查询语句,并将实例实体与标准实体关联存储。In one embodiment, the step of parsing the user query sentence, replacing the instance entity in the user query sentence with a conceptual entity, and obtaining the template query sentence specifically includes: identifying the instance entity in the user query sentence, and determining the instance entity through semantic recognition Entity type to obtain the conceptual entity representing the entity type; query the standard entity corresponding to the instance entity from the standard entity list; replace the instance entity in the user query statement with the conceptual entity to obtain the template query statement, and combine the instance entity with the standard entity Associated storage.
具体地,服务器对用户查询语句进行解析,通过命名实体识别(Named Entity Recognition,简称NER,又称专名识别)识别用户查询语句中的命名实体,将识别到的命名实体作为实例实体,并通过语义识别确定实例实体所属的实体类型以确定表示该实体类型的概念实体。Specifically, the server parses the user query sentence, recognizes the named entity in the user query sentence through Named Entity Recognition (NER, also known as proper name recognition), uses the recognized named entity as an instance entity, and passes Semantic recognition determines the entity type to which the instance entity belongs to determine the conceptual entity that represents the entity type.
用户查询语句中的实例实体可能是简称或不规范称呼,而数据库中存储的信息以标准描述方式存在。标准实体存储在标准实体列表中。The instance entity in the user query sentence may be abbreviated or irregular, and the information stored in the database exists in a standard description way. The standard entities are stored in the standard entity list.
服务器获取预先建立的标准实体列表,通过模糊匹配在标准实体列表中查找与实例实体对应的标准实体。服务器将用户查询语句中的实例实体替换为概念实体从而得到模板查询语句,同时将标准实体与实例实体关联存储在实体关联表中。实体关联表用于存储用户查询语句中的实例实体以及对应的标准实体。The server obtains the pre-established standard entity list, and searches the standard entity list for the standard entity corresponding to the instance entity through fuzzy matching. The server replaces the instance entity in the user query statement with the conceptual entity to obtain the template query statement, and at the same time associates the standard entity with the instance entity and stores it in the entity association table. The entity association table is used to store instance entities and corresponding standard entities in user query statements.
举例说明,用户查询语句中的实例实体为“M”,而“M”是简称,数据库中存储的是全称“M股份有限公司”;“M股份有限公司”是与“M”对应的标准实体。模板替换后,用户查询语句变成“<机构>是什么时候成立的?”,并将实例实体“M”与标准实体“M股份有限公司”进行关联存储,以便后续组装新的检索逻辑式。For example, the instance entity in the user query sentence is "M", and "M" is the abbreviation, and the full name "M Co., Ltd." is stored in the database; "M Co., Ltd." is the standard entity corresponding to "M" . After the template is replaced, the user query sentence becomes "When was <organization> established?", and the instance entity "M" is associated with the standard entity "M Co., Ltd." for subsequent assembly of new search logic.
本实施例中,识别用户查询语句中的实例实体并确定实例实体的实体类型,以及表示实体类型的概念实体;查询实例实体所对应的标准实体并将用户查询语句中的实例实体替换为概念实体,将用户查询语句从多样化转向标准化,减少了用户查询语句中的个性化信息,有利于后续通过相似度查询库存查询语句,保证了信息检索的准确性;将实例实体和标准实体关联存储以便后续组装新的逻辑检索式。In this embodiment, the instance entity in the user query sentence is identified and the entity type of the instance entity and the conceptual entity representing the entity type are determined; the standard entity corresponding to the instance entity is queried and the instance entity in the user query sentence is replaced with the conceptual entity , Change the user query statement from diversification to standardization, reduce the personalized information in the user query statement, facilitate subsequent query of inventory query statements through similarity, and ensure the accuracy of information retrieval; store instance entities and standard entities associatively Follow-up assembling a new logical search formula.
步骤203,计算模板查询语句与问句语料库中各库存查询语句的相似度。Step 203: Calculate the similarity between the template query sentence and each inventory query sentence in the question corpus.
其中,库存查询语句可以是存储在问句语料库中的语句;检索逻辑式是库存查询语句的另一种体现形式,用于构建检索树并表征检索逻辑。库存查询语句与检索逻辑式相对应,多个库存查询语句可以对应于同一个检索逻辑式。Among them, the inventory query sentence can be a sentence stored in the question corpus; the retrieval logic formula is another embodiment of the inventory query sentence, which is used to construct a retrieval tree and characterize the retrieval logic. The inventory query statement corresponds to the retrieval logic formula, and multiple inventory query statements can correspond to the same retrieval logic formula.
具体地,服务器访问问句语料库,将问句语料库中的各库存查询语句和模板查询语句转化为句向量。通过预设的相似度公式,计算模板查询语句的句向量与各库存查询语句的句向量之间的相似度。Specifically, the server accesses the question corpus, and converts each inventory query sentence and template query sentence in the question corpus into a sentence vector. Through the preset similarity formula, the similarity between the sentence vector of the template query sentence and the sentence vector of each inventory query sentence is calculated.
在一个实施例中,相似度的计算可以采用余弦相似度、编辑距离、杰卡德系数、TFIDF系数(在词频TF的基础上加入逆文档频率IDF)等方法,其中,余弦相似度按照如下的公式(1)进行计算:In one embodiment, the calculation of similarity can use methods such as cosine similarity, edit distance, Jaccard coefficient, TFIDF coefficient (adding inverse document frequency IDF on the basis of word frequency TF), where the cosine similarity is as follows Formula (1) is calculated:
Figure PCTCN2020117387-appb-000001
Figure PCTCN2020117387-appb-000001
其中,QuestionA可以是模板查询语句的句向量,QuestionB可以是库存查询语句的句向量。Among them, QuestionA can be the sentence vector of the template query sentence, and QuestionB can be the sentence vector of the inventory query sentence.
步骤204,根据计算得到的相似度确定与模板查询语句匹配的库存查询语句以及与库存查询语句对应的检索逻辑式。Step 204: Determine the inventory query sentence matching the template query sentence and the retrieval logic formula corresponding to the inventory query sentence according to the calculated similarity.
具体地,服务器将计算得到的相似度与预设的相似度阈值相比较,从大于相似度阈值的相似度中,筛选最大相似度所对应的库存查询语句作为与模板查询语句相匹配的库存查询语句。Specifically, the server compares the calculated similarity with a preset similarity threshold, and selects the inventory query sentence corresponding to the maximum similarity from the similarity greater than the similarity threshold as the inventory query matching the template query sentence Statement.
服务器从问句语料库中查询与该库存查询语句对应的检索逻辑式,建立用户查询语句-模板查询语句-库存查询语句-检索逻辑式之间的映射关系。The server queries the query sentence corpus for the retrieval logic formula corresponding to the inventory query statement, and establishes the mapping relationship between the user query statement-the template query statement-the inventory query statement-the retrieval logic formula.
举例说明,用户查询语句为“M是什么时候成立的?”,实例实体替换后得到模板查询语句“<机构>是什么时候成立的?”。服务器通过相似度查询到与模板查询语句匹配的库存查询语句为“<机构>的成立日期?”,及其对应的检索逻辑式<V1:Unary(class='机构',value='N')><A:Binary(V1,注册日期,A?)>,该检索逻辑式中的“N”为变量,是前次检索中被检索的标准实体。For example, the user query sentence is "When was M established?", and the template query sentence "<organization> was established?" is obtained after the instance entity is replaced. The server finds that the inventory query sentence matching the template query sentence is "<organization> date of establishment?" through the similarity, and the corresponding retrieval logic formula <V1:Unary(class='organization', value='N') ><A:Binary(V1,Registration Date,A?)>, "N" in the search logic formula is a variable, which is the standard entity searched in the previous search.
服务器实现了以下的映射关系:用户查询语句:M是什么时候成立的?-模板查询语句:<机构>是什么时候成立的?-库存查询语句:<机构>的成立日期?-检索逻辑式:<V1:Unary(class='机构',value='N')><A:Binary(V1,注册日期,A?)>。The server implements the following mapping relationship: When was the user query statement: M established? -Template query sentence: When was <organization> established? -Inventory query sentence: What is the date of establishment of <organization> -Search logic formula: <V1:Unary(class='organization', value='N')><A:Binary(V1, registration date, A?)>.
步骤205,根据实例实体对检索逻辑式进行更新。Step 205: Update the search logic formula according to the instance entity.
具体地,检索逻辑式中存在标准实体,该标准实体是前次检索中被检索的标准实体。服务器需要根据本次检索时用户查询语句中的实例实体对检索逻辑式进行更新。Specifically, there is a standard entity in the search logic formula, and the standard entity is the standard entity retrieved in the previous search. The server needs to update the retrieval logic based on the instance entity in the user query statement during this retrieval.
在一个实施例中,根据实例实体对检索逻辑式进行更新的步骤具体包括:获取与实例实体关联存储的标准实体;将检索逻辑式中的标准实体替换为获取到的标准实体。In one embodiment, the step of updating the retrieval logic formula according to the instance entity specifically includes: obtaining a standard entity stored in association with the instance entity; and replacing the standard entity in the retrieval logic formula with the retrieved standard entity.
具体地,库存查询语句所对应的检索逻辑式中,“N”所在的位置为变量,检索逻辑式中其他部分不可变,而value处的“N”是可变的。前次检索可能是“N的成立日期”,所以检索后检索逻辑式中是“N”,本次要针对“M”进行检索,所以要将“N”替换为“M股份有限公司”,否则生成的检索树是针对“N”的。Specifically, in the retrieval logic formula corresponding to the inventory query sentence, the position of "N" is a variable, the other parts of the retrieval logic formula are immutable, and the "N" at the value is variable. The previous search may be the "date of establishment of N", so the search logic formula is "N" after the search. This time the search is for "M", so replace "N" with "M Co., Ltd.", otherwise The generated search tree is for "N".
服务器从实体关联表中获取与用户查询语句中的实例实体相关联的标准实体,并将检索逻辑式中的标准实体替换为获取到的标准实体。The server obtains the standard entity associated with the instance entity in the user query statement from the entity association table, and replaces the standard entity in the retrieval logic formula with the obtained standard entity.
本实施例中,将检索逻辑式中的标准实体替换为与用户查询语句中的实例实体相关联的标准实体,替换后的检索逻辑式针对本次检索,保证了可以准确地从数据库中获取与本次检索相关的信息。In this embodiment, the standard entity in the search logic formula is replaced with the standard entity associated with the instance entity in the user query sentence. The replaced search logic formula is aimed at this search, ensuring that the data can be accurately obtained from the database. Relevant information for this search.
步骤206,基于更新后的检索逻辑式生成检索树。Step 206: Generate a search tree based on the updated search logic formula.
其中,检索树可以是基于二叉树的存储结构。Among them, the retrieval tree may be a storage structure based on a binary tree.
具体地,检索逻辑式中注明了每次检索中最后需要检索的信息。在基于更新后的检索逻辑式构建检索树时,将最后需要检索的信息作为根节点。不同的检索式可以对应于不同的检索类型,不同的检索类型对应不同的检索树结构,服务器根据检索逻辑式填充检索树结构,生成检索树。Specifically, the search logic formula indicates the last information to be searched in each search. When constructing the search tree based on the updated search logic formula, take the last information to be searched as the root node. Different search formulas can correspond to different search types, and different search types correspond to different search tree structures. The server fills the search tree structure according to the search logic formula to generate a search tree.
检索树可以是二叉树,二叉树分支中的每一个内部节点为需要检索的信息,节点的左右分支为检索条件,二叉树的根节点为最后需要检索的信息。The retrieval tree can be a binary tree, each internal node in the branches of the binary tree is the information to be retrieved, the left and right branches of the node are the retrieval conditions, and the root node of the binary tree is the information that needs to be retrieved finally.
在一个实施例中,基于更新后的检索逻辑式生成检索树的步骤具体包括:识别检索逻辑式的检索类型;当检索类型为单一三元组单一介质检索时,生成单一三元组单一介质检索树;当检索类型为多三元组多介质检索时,生成多三元组多介质检索树。In one embodiment, the step of generating a search tree based on the updated search logic specifically includes: identifying the search type of the search logic; when the search type is a single triple single medium search, generating a single triple single Medium retrieval tree; when the retrieval type is multi-triple multi-media retrieval, a multi-triple multi-media retrieval tree is generated.
其中,检索类型可以是检索的类型,由被检索对象的属性和检索时访问的存储介质确定;存储介质可以是存储信息的数据库。Among them, the retrieval type can be the retrieval type, which is determined by the attributes of the retrieved object and the storage medium accessed during retrieval; the storage medium can be a database storing information.
具体地,不同的检索式可以对应于不同的检索类型,检索类型包括单一三元组单一介质检索和多三元组多介质检索。Specifically, different retrieval formulas can correspond to different retrieval types, and retrieval types include single-triple single-media retrieval and multi-triple multi-media retrieval.
当检索类型为单一三元组单一介质检索时,生成单一三元组单一介质检索树。例如,检索单一实体单一属性值时,检索树逻辑形式为When the retrieval type is single-triple single-media retrieval, a single-triple single-media retrieval tree is generated. For example, when retrieving a single attribute value of a single entity, the logical form of the retrieval tree is
<entity=E><attr=A><attr_value=?><entity=E><attr=A><attr_value=? >
其中,E表示标准实体,attr表示标准实体的属性,这里是属性A,attr_value表示属性A的属性值。Among them, E represents the standard entity, attr represents the attribute of the standard entity, here is the attribute A, and attr_value represents the attribute value of the attribute A.
检索树结构包括根节点“属性值”、左叶结点“实体E”和右叶结点“属性A”,该结构仅在单一存储介质内检索一次。The search tree structure includes the root node "attribute value", the left leaf node "entity E" and the right leaf node "attribute A". This structure is only searched once in a single storage medium.
举例说明,检索M的注册日期时,检索树逻辑形式为:For example, when searching for the registration date of M, the logical form of the search tree is:
<entity=M股份有限公司><attr=注册日期><attr_value=?><entity=M Co., Ltd.><attr=Registration Date><attr_value=? >
对应的检索树结构含根节点“属性值”、左叶结点“M股份有限公司”和右叶结点“注册日期”,生成的检索树如图3所示。The corresponding search tree structure includes the root node "attribute value", the left leaf node "M Co., Ltd." and the right leaf node "registered date". The generated search tree is shown in Figure 3.
当检索类型为多三元组多介质检索时,生成多三元组多介质检索树。单一三元组单一介质检索树和多三元组多介质检索树均为二叉树,但二者深度和形态并不相同。如检索和实例实体具备某种关系的实体的属性值,检索树逻辑形式为When the retrieval type is multi-triple multi-media retrieval, a multi-triple multi-media retrieval tree is generated. The single-triple single-medium retrieval tree and the multi-triplet multi-medium retrieval tree are both binary trees, but the depth and shape of the two are not the same. If the attribute value of an entity that has a certain relationship between the retrieval and the instance entity, the logical form of the retrieval tree is
<entity=(<head_entity=HE><relation='R'><tail_entity=?>)><attr=A><attr_value=?><entity=(<head_entity=HE><relation='R'><tail_entity=?>)><attr=A><attr_value=? >
其中,HE为标准实体,HE是检索树中的头实体head_entity,relation='R'表示HE与另一个实体关系为R,另一个实体是检索树中的尾实体tail_entity,attr=A表示尾实体的属性A,attr_value表示属性A的属性值。Among them, HE is the standard entity, HE is the head entity head_entity in the search tree, relation='R' indicates that the relationship between HE and another entity is R, and the other entity is the tail entity tail_entity in the search tree, and attr=A indicates the tail entity The attribute A of attribute A, attr_value represents the attribute value of attribute A.
检索树结构包含根节点“属性值”、左子树(左叶结点“实体HE”、右叶结点“关系R”)和右叶结点“属性A”,该结构在两种存储介质内分别检索一次。The search tree structure contains the root node "attribute value", the left subtree (left leaf node "entity HE", right leaf node "relation R") and right leaf node "attribute A". The structure is in two storage media Retrieve once within each.
举例说明,检索M的注册日期时,检索树逻辑形式为For example, when searching for the registration date of M, the logical form of the search tree is
<entity=(<head_entity=M股份有限公司><relation=投资><tail_entity=?>)><attr=注册日期><attr_value=?><entity=(<head_entity=M Co., Ltd.><relation=investment><tail_entity=?>)><attr=registration date><attr_value=? >
对应的检索树结构包含根节点“属性值”、左子树(左叶结点“M股份有限公司”、右叶结点“投资关系”)和右叶结点“注册日期”,生成的检索树如图4所示。The corresponding search tree structure includes the root node "attribute value", the left subtree (the left leaf node "M Co., Ltd.", the right leaf node "investment relationship") and the right leaf node "registered date", the generated search The tree is shown in Figure 4.
本实施例中,生成与检索逻辑式的检索类型相对应的检索树,检索树指示如何从数据库中检索信息,保证了可以准确地从数据库中获取与用户查询语句相关的信息。In this embodiment, a search tree corresponding to the search type of the search logic is generated, and the search tree indicates how to retrieve information from the database, ensuring that the information related to the user query sentence can be accurately obtained from the database.
步骤207,根据检索树对数据库进行信息检索,并展示检索到的答案信息。Step 207: Perform information retrieval on the database according to the retrieval tree, and display the retrieved answer information.
具体地,检索树的节点是需要检索的信息,各节点的左右分支是对该节点进行检索时所需的检索条件,二叉树的根节点作为最后需要检索的信息。服务器对检索树进行深度优先遍历,以对检索树进行可行性校验并得到检索策略。Specifically, the node of the search tree is the information to be searched, the left and right branches of each node are the search conditions required when searching for the node, and the root node of the binary tree is the information that needs to be searched finally. The server performs depth-first traversal of the search tree to verify the feasibility of the search tree and obtain a search strategy.
举例说明,当用户查询语句为“张三的注册日期是什么?”,则检索树的左叶节点为“张三”,右叶节点为“注册日期”。服务器通过深度优先遍历校验节点是否满足语法,而“张三”是一个人名,与“注册日期”不匹配,即“张三”不具备检索“注册日期”的可行性,返回错误提示信息。深度优先遍历除了可以检验检索树的可行性,还可以确定在数据库中的检索步骤,即需要先检索各节点的左右分支得到各节点的相关信息,最后检索根节点的相关信息。确定的检索步骤即为检索策略,服务器按照检索策略在各数据库中进行检索,检索到答案信息后,将答案信息返回到终端进行展示。For example, when the user query sentence is "What is the registration date of Zhang San?", the left leaf node of the search tree is "Zhang San", and the right leaf node is "Registration Date". The server verifies whether the node satisfies the grammar through depth-first traversal, and "Zhang San" is a name that does not match the "Registration Date", that is, "Zhang San" does not have the feasibility to retrieve the "Registration Date" and returns an error message. In addition to checking the feasibility of the search tree, depth-first traversal can also determine the search steps in the database, that is, first search the left and right branches of each node to obtain the relevant information of each node, and finally retrieve the relevant information of the root node. The determined search step is the search strategy. The server searches in each database according to the search strategy. After the answer information is retrieved, the answer information is returned to the terminal for display.
其中,深度优先(Depth-First-Search)是要达到检索树中的叶结点(即不包含任何分支的节点)。在对检索树进行深度优先检索时,先完整地搜索单独的一条链,当沿着一条链走到没有分支时,返回上一个节点以继续探索检索树中的其他链,当整个检索树中不再有其他链可选择时,深度优先检索结束。Among them, the depth-first (Depth-First-Search) is to reach the leaf nodes in the search tree (that is, nodes that do not contain any branches). When performing a depth-first search on the search tree, first search a single chain completely. When there is no branch along a chain, return to the previous node to continue to explore other chains in the search tree. When there is no branch in the entire search tree, return to the previous node. When there are other chains to choose from, the depth-first search ends.
在一个实施例中,根据检索树对数据库进行信息检索,并展示检索到的答案信息的步 骤之后,还包括:将模板查询语句设置为库存查询语句以更新问句语料库;将问句语料库中新添加的库存查询语句与更新后的检索逻辑式互相关联。In one embodiment, after the steps of performing information retrieval on the database according to the search tree and displaying the retrieved answer information, the method further includes: setting the template query sentence as the inventory query sentence to update the question sentence corpus; The added inventory query statement is related to the updated retrieval logic.
具体地,服务器在完成检索后,将用户查询语句经过替换得到的模板查询语句添加到问句语料库中,得到新的库存查询语句;并将根据标准实体更新后的检索逻辑式与新添加的库存查询语句设置为互相关联。Specifically, after the server completes the search, the template query sentence obtained by replacing the user query sentence is added to the question corpus to obtain a new inventory query sentence; the search logic is updated according to the standard entity and the newly added inventory. The query statements are set to be related to each other.
新添加的库存查询语句可以参与以后的检索,以不断丰富问句语料库,提升***鲁棒性和应对不同问句的处理能力。The newly added inventory query sentences can participate in future searches to continuously enrich the question corpus, improve the system's robustness and the processing ability to deal with different questions.
本实施例中,将模板查询语句添加到问句语料库中并匹配检索逻辑式,丰富了问句语料库中的库存查询语句,提高了***对各种用户查询语句的处理能力。In this embodiment, the template query sentence is added to the question corpus and matched with the retrieval logic, which enriches the inventory query sentence in the question corpus and improves the system's processing ability for various user query sentences.
本实施例中,先将获取到的用户查询语句中的实例实体进行替换,得到模板查询语句,模板查询语句对用户查询语句进行个性化去除,再计算模板查询语句与语料库中各库存查询语句的相似度,根据相似度确定与用户查询语句匹配的库存查询语句及其检索逻辑式,以提升对各种形式的用户查询语句的处理能力,保证信息检索的准确性和可用性;根据检索逻辑式生成检索树,检索树指示如何从多个数据库中检索信息,基于检索树进行检索可以准确地从数据库中检索到用户查询语句所针对的信息,进一步确保了信息检索的准确性。In this embodiment, the instance entity in the obtained user query sentence is first replaced to obtain a template query sentence. The template query sentence personalizes the user query sentence and then calculates the difference between the template query sentence and each inventory query sentence in the corpus. Similarity, according to the similarity, the inventory query sentence matching the user query sentence and its retrieval logic are determined to improve the processing ability of various forms of user query sentences to ensure the accuracy and usability of information retrieval; generate according to the retrieval logic formula Search tree, the search tree indicates how to retrieve information from multiple databases. Retrieval based on the search tree can accurately retrieve the information targeted by the user's query statement from the database, which further ensures the accuracy of information retrieval.
进一步的,如图5所示,上述步骤207可以包括:Further, as shown in FIG. 5, the foregoing step 207 may include:
步骤2071,对检索树进行深度优先遍历,以确定与检索树对应的检索策略,并基于检索策略确定信息类型。Step 2071: Perform a depth-first traversal of the search tree to determine the search strategy corresponding to the search tree, and determine the information type based on the search strategy.
其中,信息类型可以是对标准实体检索的信息的类型,包括检索单一实体单一属性、实体关系、单一实体多属性、多实体单一属性、属性变化趋势(包括单一实体多属性变化趋势和多实体单一属性变化趋势)等。Among them, the information type can be the type of information retrieved for standard entities, including retrieval of single entity single attribute, entity relationship, single entity multi-attribute, multi-entity single attribute, attribute change trend (including single entity multi-attribute change trend and multi-entity single attribute Attribute change trend) etc.
在检索之前,需要先对各种信息进行有序存储。对于<实体-属性-属性值>类三元组数据,需满足实时检索、分析和筛选,可存储于分布式可扩展数据库ElasticSearch。在ElasticSearch中,通过大数据或历史数据获取标准实体的检索频率,按照检索频率对标准实体进行倒排索引,以便尽快检索到所需信息。不作为检索条件的实体属性(如公告、新闻等长文本型数据),存储于传统关系型数据库PostgreSQL,以降低ElasticSearch数据库负载。对于<头实体-关系-尾实体>类三元组数据,存储于NoSQL(Not Only SQL,非关系型的数据库)中的图数据库Neo4j。Before retrieval, various information needs to be stored in an orderly manner. For the <entity-attribute-attribute value> type triple data, real-time retrieval, analysis and filtering are required, and it can be stored in the distributed and scalable database ElasticSearch. In ElasticSearch, the retrieval frequency of standard entities is obtained through big data or historical data, and the standard entities are inverted index according to the retrieval frequency, so that the required information can be retrieved as soon as possible. Entity attributes that are not used as search conditions (such as announcements, news and other long text data) are stored in the traditional relational database PostgreSQL to reduce the load of the ElasticSearch database. For <head entity-relationship-tail entity> type triple data, it is stored in the graph database Neo4j in NoSQL (Not Only SQL, non-relational database).
服务器通过深度优先遍历确定检索策略,检索策略指示了如何从数据库中获取信息。The server determines the retrieval strategy through depth-first traversal, and the retrieval strategy instructs how to obtain information from the database.
举例说明,对于图3中的检索树,检索策略为:访问ElasticSearch数据库,从ElasticSearch数据库中检索M股份有限公司的注册日期。对于图4中的检索树,检索策略为:从Neo4j数据库中检索与M股份有限公司具备投资关系的尾实体,再到ElasticSearch数据库中检索尾实体的注册日期,最后基于M股份有限公司拼接在不同数据库中检索到的答案信息。For example, for the search tree in Figure 3, the search strategy is: access the ElasticSearch database, and retrieve the registration date of M Co., Ltd. from the ElasticSearch database. For the search tree in Figure 4, the search strategy is: search for tail entities that have an investment relationship with M Co., Ltd. from the Neo4j database, and then search for the registration date of the tail entity in the ElasticSearch database, and finally splice it in different ways based on M Co., Ltd. Answer information retrieved in the database.
信息类型可以由检索策略确定。例如,当检索策略为访问ElasticSearch数据库,从ElasticSearch数据库中检索M股份有限公司的注册日期时,只需检索“M股份有限公司”这一个实体的一个属性“注册日期”,信息类型为单一实体单一属性。当需要检索某行业六个公司在2019年的贸易额时,需检索六个标准实体的同一属性“贸易额”,信息类型为多实体单一属性。The type of information can be determined by the retrieval strategy. For example, when the retrieval strategy is to access the ElasticSearch database and retrieve the registration date of M Co., Ltd. from the ElasticSearch database, only one attribute "Registration Date" of the entity "M Co., Ltd." needs to be retrieved, and the information type is a single entity. Attributes. When you need to retrieve the trade volume of six companies in a certain industry in 2019, you need to retrieve the same attribute "trade volume" of the six standard entities, and the information type is multi-entity single attribute.
步骤2072,根据检索策略对数据库进行信息检索,得到答案信息。Step 2072: Perform information retrieval on the database according to the retrieval strategy to obtain answer information.
具体地,服务器根据确定的检索策略访问数据库,从数据库中提取信息,得到答案信息。Specifically, the server accesses the database according to the determined search strategy, extracts information from the database, and obtains answer information.
步骤2073,依据信息类型对答案信息进行展示。Step 2073: Display the answer information according to the information type.
具体地,服务器根据信息类型确定答案信息的展示方式,展示方式包括文字、图表等方式。服务器将答案信息发送至终端,由终端依据确定的展示方式展示答案信息。Specifically, the server determines the display mode of the answer information according to the information type, and the display mode includes text, diagrams, etc. The server sends the answer information to the terminal, and the terminal displays the answer information according to the determined display mode.
在一个实施例中,依据信息类型对答案信息进行展示的步骤具体包括:当信息类型为单一实体单一属性或实体关系时,以文本展示答案信息;当信息类型为单一实体多属性或多实体单一属性时,以柱状图展示答案信息;当信息类型为属性变化趋势时,以折线图展示答案信息。In one embodiment, the step of displaying the answer information according to the information type specifically includes: when the information type is a single entity with a single attribute or entity relationship, displaying the answer information in text; when the information type is a single entity with multiple attributes or a single entity with multiple entities In the case of attributes, the answer information is displayed in a histogram; when the information type is an attribute change trend, the answer information is displayed in a line chart.
具体地,当检索树确定的信息类型为单一实体单一属性,则以描述性文本展示答案信息。以图3为例,描述性文本的格式为:<实体>的<属性名>是<属性值>,则有:M股份有限公司的注册日期为xxxx年xx月xx日,答案信息维度为1*2。Specifically, when the information type determined by the search tree is a single entity and a single attribute, the answer information is displayed in descriptive text. Taking Figure 3 as an example, the format of the descriptive text is: <attribute name> of <entity> is <attribute value>, then: the registration date of M Co., Ltd. is xxxx year xx month xx day, and the answer information dimension is 1 *2.
当信息类型为单一实体多属性或多实体单一属性时,以柱状图展示答案信息。例如,检索某行业六个公司的贸易额时展示的答案信息如图6所示,答案信息中还可以包括数据日期和各公司名称。柱状图展示的答案信息的答案维度为1*N(N>2)或N*2,此处N为正整数。When the information type is a single entity with multiple attributes or multiple entities with a single attribute, the answer information is displayed in a histogram. For example, the answer information displayed when retrieving the trade volume of six companies in a certain industry is shown in Figure 6. The answer information can also include the date of data and the name of each company. The answer dimension of the answer information displayed in the histogram is 1*N (N>2) or N*2, where N is a positive integer.
当问句中包含趋势、变化等关键词,或包含时间序列时,信息类型为属性变化趋势,以折线图展示答案信息。例如,当检索M股份有限公司2019年各季度销售额的变化趋势时,展示的答案信息如图7所示,答案信息中还可以包括数据日期。When the question contains keywords such as trend and change, or contains time series, the information type is the attribute change trend, and the answer information is displayed in a line graph. For example, when searching for the change trend of the sales of M Co., Ltd. in each quarter in 2019, the displayed answer information is as shown in Figure 7, and the answer information may also include the data date.
本实施例中,依据检索的信息类型提供文字、图形等方式展示答案信息,提高了答案信息展示的智能性。In this embodiment, the answer information is displayed in text, graphics, etc. according to the type of information retrieved, which improves the intelligence of answer information display.
步骤2074,将答案信息上传至区块链中。 Step 2074, upload the answer information to the blockchain.
具体地,基于答案信息得到对应的摘要信息,具体来说,摘要信息由答案信息进行散列处理得到,比如利用sha256s算法处理得到。将摘要信息上传至区块链可保证其安全性和对用户的公正透明性。用户设备可以从区块链中下载得该摘要信息,以便查证答案信息是否被篡改。Specifically, the corresponding summary information is obtained based on the answer information. Specifically, the summary information is obtained by hashing the answer information, for example, obtained by using the sha256s algorithm. Uploading summary information to the blockchain can ensure its security and fairness and transparency to users. The user equipment can download the summary information from the blockchain to verify whether the answer information has been tampered with.
本实施例中,对检索树进行深度优先遍历以获取检索策略并基于检索策略确定信息类型,依据检索策略服务器可以更快更准确地从数据库中获取需要的信息,依据信息类型可以智能化地展示答案信息,并将答案信息上传至区块链以保证答案信息的安全性和公正透明。In this embodiment, the search tree is traversed depth-first to obtain the search strategy and the information type is determined based on the search strategy. According to the search strategy, the server can obtain the required information from the database faster and more accurately, and can display it intelligently according to the information type. Answer information and upload the answer information to the blockchain to ensure the security, fairness and transparency of the answer information.
本申请中基于语音语义的信息检索方法涉及人工智能领域中的神经网络、自然语言处理、语音处理和知识表示与推理;此外,还可以涉及智慧城市领域中的智慧生活。The information retrieval method based on speech semantics in this application involves neural networks, natural language processing, speech processing, and knowledge representation and reasoning in the field of artificial intelligence; in addition, it may also involve smart life in the field of smart cities.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a computer-readable storage medium. When the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Among them, the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowchart of the drawings are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless explicitly stated in this article, the execution of these steps is not strictly limited in order, and they can be executed in other orders. Moreover, at least part of the steps in the flowchart of the drawings may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times, and the order of execution is also It is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
进一步参考图8,作为对上述图2所示方法的实现,本申请提供了一种基于语音语义的信息检索装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。With further reference to FIG. 8, as an implementation of the method shown in FIG. 2, this application provides an embodiment of an information retrieval device based on speech semantics. The device embodiment corresponds to the method embodiment shown in FIG. 2. The device can be specifically applied to various electronic devices.
如图8所示,本实施例所述的基于语音语义的信息检索装置300包括:语句获取模块301、实体替换模块302、相似度计算模块303、语句确定模块304、逻辑式更新模块305、检索树生成模块306和信息检索模块307,其中:As shown in FIG. 8, the information retrieval device 300 based on speech semantics in this embodiment includes: a sentence acquisition module 301, an entity replacement module 302, a similarity calculation module 303, a sentence determination module 304, a logical expression update module 305, and a search The tree generation module 306 and the information retrieval module 307, wherein:
语句获取模块301,用于获取输入的用户查询语句。The sentence acquisition module 301 is used to acquire the input user query sentence.
实体替换模块302,用于解析用户查询语句,将用户查询语句中的实例实体替换为概念实体,得到模板查询语句;概念实体为实例实体所属的实体类型。The entity replacement module 302 is used to parse the user query statement, replace the instance entity in the user query statement with a conceptual entity to obtain a template query statement; the conceptual entity is the entity type to which the instance entity belongs.
相似度计算模块303,用于计算模板查询语句与问句语料库中各库存查询语句的相似度。The similarity calculation module 303 is used to calculate the similarity between the template query sentence and each inventory query sentence in the question corpus.
语句确定模块304,用于根据计算得到的相似度确定与模板查询语句匹配的库存查询语句,以及与库存查询语句对应的检索逻辑式。The sentence determination module 304 is configured to determine the inventory query sentence matching the template query sentence and the retrieval logic formula corresponding to the inventory query sentence according to the calculated similarity.
逻辑式更新模块305,用于根据实例实体对检索逻辑式进行更新。The logical update module 305 is used to update the retrieval logical formula according to the instance entity.
检索树生成模块306,用于基于更新后的检索逻辑式生成检索树。The retrieval tree generation module 306 is configured to generate a retrieval tree based on the updated retrieval logic formula.
信息检索模块307,用于根据检索树对数据库进行信息检索,并展示检索到的答案信息。The information retrieval module 307 is used to perform information retrieval on the database according to the retrieval tree and display the retrieved answer information.
本实施例中,先将获取到的用户查询语句中的实例实体进行替换,得到模板查询语句,模板查询语句对用户查询语句进行个性化去除,再计算模板查询语句与语料库中各库存查询语句的相似度,根据相似度确定与用户查询语句匹配的库存查询语句及其检索逻辑式,以提升对各种形式的用户查询语句的处理能力,保证信息检索的准确性和可用性;根据检索逻辑式生成检索树,检索树指示如何从多个数据库中检索信息,基于检索树进行检索可以准确地从数据库中检索到用户查询语句所针对的信息,进一步确保了信息检索的准确性。In this embodiment, the instance entity in the obtained user query sentence is first replaced to obtain a template query sentence. The template query sentence personalizes the user query sentence and then calculates the difference between the template query sentence and each inventory query sentence in the corpus. Similarity, according to the similarity, the inventory query sentence matching the user query sentence and its retrieval logic are determined to improve the processing ability of various forms of user query sentences to ensure the accuracy and usability of information retrieval; generate according to the retrieval logic formula Search tree, the search tree indicates how to retrieve information from multiple databases. Retrieval based on the search tree can accurately retrieve the information targeted by the user's query statement from the database, which further ensures the accuracy of information retrieval.
在本实施例的一些可选的实现方式中,上述实体替换模块302包括:语句解析子模块、标准查询子模块以及实体替换子模块,其中:In some optional implementations of this embodiment, the above-mentioned entity replacement module 302 includes: a sentence parsing sub-module, a standard query sub-module, and an entity replacement sub-module, wherein:
语句解析子模块,用于识别用户查询语句中的实例实体,并通过语义识别确定实例实体的实体类型以得到表示实体类型的概念实体。The sentence parsing sub-module is used to identify the instance entity in the user's query sentence, and determine the entity type of the instance entity through semantic recognition to obtain the conceptual entity representing the entity type.
标准查询子模块,用于从标准实体列表中查询与实例实体所对应的标准实体。The standard query sub-module is used to query the standard entity corresponding to the instance entity from the standard entity list.
实体替换子模块,用于将用户查询语句中的实例实体替换为概念实体得到模板查询语句,并将实例实体与标准实体关联存储。The entity replacement sub-module is used to replace the instance entity in the user query statement with the conceptual entity to obtain the template query statement, and store the instance entity in association with the standard entity.
本实施例中,识别用户查询语句中的实例实体并确定实例实体的实体类型,以及表示实体类型的概念实体;查询实例实体所对应的标准实体并将用户查询语句中的实例实体替换为概念实体,将用户查询语句从多样化转向标准化,减少了用户查询语句中的个性化信息,有利于后续通过相似度查询库存查询语句,保证了信息检索的准确性;将实例实体和标准实体关联存储以便后续组装新的逻辑检索式。In this embodiment, the instance entity in the user query sentence is identified and the entity type of the instance entity and the conceptual entity representing the entity type are determined; the standard entity corresponding to the instance entity is queried and the instance entity in the user query sentence is replaced with the conceptual entity , Change the user query statement from diversification to standardization, reduce the personalized information in the user query statement, facilitate subsequent query of inventory query statements through similarity, and ensure the accuracy of information retrieval; store instance entities and standard entities associatively Follow-up assembling a new logical search formula.
在本实施例的一些可选的实现方式中,上述逻辑式更新模块305包括:实体获取子模块和标准替换子模块,其中:In some optional implementation manners of this embodiment, the above-mentioned logical update module 305 includes: an entity acquisition sub-module and a standard replacement sub-module, wherein:
实体获取子模块,用于获取与实例实体关联存储的标准实体。The entity acquisition sub-module is used to acquire the standard entity stored in association with the instance entity.
标准替换子模块,用于将检索逻辑式中的标准实体替换为获取到的标准实体。The standard replacement sub-module is used to replace the standard entity in the search logic formula with the obtained standard entity.
本实施例中,将检索逻辑式中的标准实体替换为与用户查询语句中的实例实体相关联的标准实体,替换后的检索逻辑式针对本次检索,保证了可以准确地从数据库中获取与本次检索相关的信息。In this embodiment, the standard entity in the search logic formula is replaced with the standard entity associated with the instance entity in the user query sentence. The replaced search logic formula is aimed at this search, ensuring that the data can be accurately obtained from the database. Relevant information for this search.
在本实施例的一些可选的实现方式中,上述检索树生成模块306包括:类型识别子模块和检索树生成子模块,其中:In some optional implementations of this embodiment, the above-mentioned search tree generation module 306 includes: a type recognition sub-module and a search tree generation sub-module, wherein:
类型识别子模块,用于识别检索逻辑式的检索类型。The type identification sub-module is used to identify the search type of the search logic.
检索树生成子模块,用于当检索类型为单一三元组单一介质检索时,生成单一三元组单一介质检索树。The search tree generation sub-module is used to generate a single-triple single-media search tree when the search type is single-triple single-media search.
所述检索树生成子模块还用于,当检索类型为多三元组多介质检索时,生成多三元组多介质检索树。The search tree generation submodule is also used to generate a multi-triple multi-media search tree when the search type is a multi-triple multi-media search.
本实施例中,生成与检索逻辑式的检索类型相对应的检索树,检索树指示如何从数据库中检索信息,保证了可以准确地从数据库中获取与用户查询语句相关的信息。In this embodiment, a search tree corresponding to the search type of the search logic is generated, and the search tree indicates how to retrieve information from the database, ensuring that the information related to the user query sentence can be accurately obtained from the database.
在本实施例的一些可选的实现方式中,上述信息检索模块307包括:深度遍历子模块、 信息检索子模块、信息展示子模块和信息上传子模块,其中:In some optional implementations of this embodiment, the above-mentioned information retrieval module 307 includes: a depth traversal sub-module, an information retrieval sub-module, an information display sub-module, and an information upload sub-module, among which:
深度遍历子模块,用于对检索树进行深度优先遍历,以确定与检索树对应的检索策略,并基于所述检索策略确定信息类型。The depth traversal sub-module is used for depth-first traversal of the search tree to determine the search strategy corresponding to the search tree, and to determine the information type based on the search strategy.
信息检索子模块,用于根据检索策略对数据库进行信息检索,得到答案信息。The information retrieval sub-module is used to retrieve information from the database according to the retrieval strategy to obtain answer information.
信息展示子模块,用于依据信息类型对答案信息进行展示。The information display sub-module is used to display the answer information according to the information type.
信息上传子模块,用于将答案信息上传至区块链中。The information upload sub-module is used to upload answer information to the blockchain.
本实施例中,对检索树进行深度优先遍历以获取检索策略并基于检索策略确定信息类型,依据检索策略服务器可以更快更准确地从数据库中获取需要的信息,依据信息类型可以智能化地展示答案信息,并将答案信息上传至区块链以保证答案信息的安全性和公正透明。In this embodiment, the search tree is traversed depth-first to obtain the search strategy and the information type is determined based on the search strategy. According to the search strategy, the server can obtain the required information from the database faster and more accurately, and can display it intelligently according to the information type. Answer information and upload the answer information to the blockchain to ensure the security, fairness and transparency of the answer information.
在本实施例的一些可选的实现方式中,上述信息展示子模块包括:文本展示单元、柱状图展示单元和折线图展示单元,其中:In some optional implementations of this embodiment, the above-mentioned information display submodule includes: a text display unit, a bar chart display unit, and a line chart display unit, wherein:
文本展示单元,用于当信息类型为单一实体单一属性或实体关系时,以文本展示答案信息。The text display unit is used to display the answer information in text when the information type is a single entity, single attribute or entity relationship.
柱状图展示单元,用于当信息类型为单一实体多属性或多实体单一属性时,以柱状图展示答案信息。The bar graph display unit is used to display the answer information in a bar graph when the information type is a single entity with multiple attributes or multiple entities with a single attribute.
折线图展示单元,用于当信息类型为属性变化趋势时,以折线图展示答案信息。The line chart display unit is used to display the answer information in a line chart when the information type is an attribute change trend.
本实施例中,依据检索的信息类型提供文字、图形等方式展示答案信息,提高了答案信息展示的智能性。In this embodiment, the answer information is displayed in text, graphics, etc. according to the type of information retrieved, which improves the intelligence of answer information display.
在本实施例的一些可选的实现方式中,上述基于语音语义的信息检索装置300还包括:语句更新模块和关联模块,其中:In some optional implementation manners of this embodiment, the above-mentioned speech semantic-based information retrieval apparatus 300 further includes: a sentence update module and an association module, wherein:
语句更新模块,用于将模板查询语句设置为库存查询语句以更新问句语料库。The sentence update module is used to set the template query sentence as the inventory query sentence to update the question sentence corpus.
关联模块,用于将问句语料库中新添加的库存查询语句与更新后的检索逻辑式互相关联。The association module is used to associate the newly added inventory query sentence in the question corpus with the updated retrieval logic.
本实施例中,将模板查询语句添加到问句语料库中并匹配检索逻辑式,丰富了问句语料库中的库存查询语句,提高了***对各种用户查询语句的处理能力。In this embodiment, the template query sentence is added to the question corpus and matched with the retrieval logic, which enriches the inventory query sentence in the question corpus and improves the system's processing ability for various user query sentences.
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图9,图9为本实施例计算机设备基本结构框图。In order to solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 9 for details. FIG. 9 is a block diagram of the basic structure of the computer device in this embodiment.
所述计算机设备4包括通过***总线相互通信连接存储器41、处理器42、网络接口43。需要指出的是,图中仅示出了具有组件41-43的计算机设备4,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。The computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are connected to each other in communication via a system bus. It should be pointed out that the figure only shows the computer device 4 with components 41-43, but it should be understood that it is not required to implement all the shown components, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
所述存储器41至少包括一种类型的计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器41可以是所述计算机设备4的内部存储单元,例如该计算机设备4的硬盘或内存。在另一些实施例中,所述存储器41也可以是所述计算机设备4的外部存储设备,例如该计算机设备4 上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器41还可以既包括所述计算机设备4的内部存储单元也包括其外部存储设备。本实施例中,所述存储器41通常用于存储安装于所述计算机设备4的操作***和各类应用软件,例如基于语音语义的信息检索方法的计算机可读指令等。此外,所述存储器41还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 41 includes at least one type of computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium includes flash memory, hard disk, and multimedia card. , Card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), Programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk equipped on the computer device 4, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, Flash Card, etc. Of course, the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device. In this embodiment, the memory 41 is generally used to store an operating system and various application software installed in the computer device 4, such as computer-readable instructions of a speech-semantic-based information retrieval method. In addition, the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
所述处理器42在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器42通常用于控制所述计算机设备4的总体操作。本实施例中,所述处理器42用于运行所述存储器41中存储的计算机可读指令或者处理数据,例如运行所述基于语音语义的信息检索方法的计算机可读指令。The processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 42 is generally used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to run computer-readable instructions or process data stored in the memory 41, for example, run the computer-readable instructions of the speech semantic-based information retrieval method.
所述网络接口43可包括无线网络接口或有线网络接口,该网络接口43通常用于在所述计算机设备4与其他电子设备之间建立通信连接。The network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.
本实施例中提供的计算机设备可以执行上述基于语音语义的信息检索方法的步骤。此处基于语音语义的信息检索方法的步骤可以是上述各个实施例的基于语音语义的信息检索方法中的步骤。The computer device provided in this embodiment can execute the steps of the above-mentioned speech semantic-based information retrieval method. Here, the steps of the information retrieval method based on speech semantics may be the steps in the information retrieval method based on speech semantics in each of the above embodiments.
本实施例中,先将获取到的用户查询语句中的实例实体进行替换,得到模板查询语句,模板查询语句对用户查询语句进行个性化去除,再计算模板查询语句与语料库中各库存查询语句的相似度,根据相似度确定与用户查询语句匹配的库存查询语句及其检索逻辑式,以提升对各种形式的用户查询语句的处理能力,保证信息检索的准确性和可用性;根据检索逻辑式生成检索树,检索树指示如何从多个数据库中检索信息,基于检索树进行检索可以准确地从数据库中检索到用户查询语句所针对的信息,进一步确保了信息检索的准确性。In this embodiment, the instance entity in the obtained user query sentence is first replaced to obtain a template query sentence. The template query sentence personalizes the user query sentence and then calculates the difference between the template query sentence and each inventory query sentence in the corpus. Similarity, according to the similarity, the inventory query sentence matching the user query sentence and its retrieval logic are determined to improve the processing ability of various forms of user query sentences to ensure the accuracy and usability of information retrieval; generate according to the retrieval logic formula Search tree, the search tree indicates how to retrieve information from multiple databases. Retrieval based on the search tree can accurately retrieve the information targeted by the user's query statement from the database, which further ensures the accuracy of information retrieval.
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有基于语音语义的信息检索的计算机可读指令,所述基于语音语义的信息检索的计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的基于语音语义的信息检索方法的步骤。This application also provides another implementation manner, that is, to provide a computer-readable storage medium that stores computer-readable instructions for information retrieval based on speech semantics, and the information retrieval based on speech semantics The computer-readable instructions of may be executed by at least one processor, so that the at least one processor executes the steps of the above-mentioned speech semantic-based information retrieval method.
本实施例中,先将获取到的用户查询语句中的实例实体进行替换,得到模板查询语句,模板查询语句对用户查询语句进行个性化去除,再计算模板查询语句与语料库中各库存查询语句的相似度,根据相似度确定与用户查询语句匹配的库存查询语句及其检索逻辑式,以提升对各种形式的用户查询语句的处理能力,保证信息检索的准确性和可用性;根据检索逻辑式生成检索树,检索树指示如何从多个数据库中检索信息,基于检索树进行检索可以准确地从数据库中检索到用户查询语句所针对的信息,进一步确保了信息检索的准确性。In this embodiment, the instance entity in the obtained user query sentence is first replaced to obtain a template query sentence. The template query sentence personalizes the user query sentence and then calculates the difference between the template query sentence and each inventory query sentence in the corpus. Similarity, according to the similarity, the inventory query sentence matching the user query sentence and its retrieval logic are determined to improve the processing ability of various forms of user query sentences to ensure the accuracy and usability of information retrieval; generate according to the retrieval logic formula Search tree, the search tree indicates how to retrieve information from multiple databases. Retrieval based on the search tree can accurately retrieve the information targeted by the user's query statement from the database, which further ensures the accuracy of information retrieval.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Obviously, the embodiments described above are only a part of the embodiments of the present application, rather than all of the embodiments. The drawings show preferred embodiments of the present application, but do not limit the patent scope of the present application. The present application can be implemented in many different forms. On the contrary, the purpose of providing these examples is to make the understanding of the disclosure of the present application more thorough and comprehensive. Although this application has been described in detail with reference to the foregoing embodiments, for those skilled in the art, it is still possible for those skilled in the art to modify the technical solutions described in each of the foregoing specific embodiments, or equivalently replace some of the technical features. . All equivalent structures made by using the contents of the description and drawings of this application, directly or indirectly used in other related technical fields, are similarly within the scope of patent protection of this application.

Claims (20)

  1. 一种基于语音语义的信息检索方法,其中,包括下述步骤:An information retrieval method based on speech semantics, which includes the following steps:
    获取输入的用户查询语句;Get the input user query statement;
    解析所述用户查询语句,将所述用户查询语句中的实例实体替换为概念实体,得到模板查询语句;所述概念实体为所述实例实体所属的实体类型;Parse the user query sentence, replace the instance entity in the user query sentence with a conceptual entity to obtain a template query sentence; the conceptual entity is the entity type to which the instance entity belongs;
    计算所述模板查询语句与问句语料库中各库存查询语句的相似度;Calculating the similarity between the template query sentence and each inventory query sentence in the question corpus;
    根据计算得到的相似度确定与所述模板查询语句匹配的库存查询语句,以及与所述库存查询语句对应的检索逻辑式;Determine, according to the calculated similarity, an inventory query sentence matching the template query sentence, and a retrieval logic formula corresponding to the inventory query sentence;
    根据所述实例实体对检索逻辑式进行更新;Update the retrieval logic formula according to the instance entity;
    基于更新后的检索逻辑式生成检索树;Generate a search tree based on the updated search logic;
    根据所述检索树对数据库进行信息检索,并展示检索到的答案信息。Information retrieval is performed on the database according to the retrieval tree, and the retrieved answer information is displayed.
  2. 根据权利要求1所述的基于语音语义的信息检索方法,其中,所述解析所述用户查询语句,将所述用户查询语句中的实例实体替换为概念实体,得到模板查询语句的步骤具体包括:The method for information retrieval based on speech semantics according to claim 1, wherein the step of parsing the user query sentence, replacing an instance entity in the user query sentence with a conceptual entity, and obtaining a template query sentence specifically comprises:
    识别所述用户查询语句中的实例实体,并通过语义识别确定所述实例实体的实体类型以得到表示所述实体类型的概念实体;Identifying the instance entity in the user query sentence, and determining the entity type of the instance entity through semantic recognition to obtain a conceptual entity representing the entity type;
    从标准实体列表中查询与所述实例实体所对应的标准实体;Query the standard entity corresponding to the instance entity from the standard entity list;
    将所述用户查询语句中的实例实体替换为所述概念实体得到模板查询语句,并将所述实例实体与所述标准实体关联存储。Replace the instance entity in the user query statement with the concept entity to obtain a template query statement, and store the instance entity in association with the standard entity.
  3. 根据权利要求2所述的基于语音语义的信息检索方法,其中,所述根据所述实例实体对检索逻辑式进行更新的步骤具体包括:The method for information retrieval based on speech semantics according to claim 2, wherein the step of updating the retrieval logic formula according to the instance entity specifically comprises:
    获取与所述实例实体关联存储的标准实体;Obtain a standard entity stored in association with the instance entity;
    将所述检索逻辑式中的标准实体替换为获取到的标准实体。Replace the standard entity in the search logic formula with the obtained standard entity.
  4. 根据权利要求1所述的基于语音语义的信息检索方法,其中,所述基于更新后的检索逻辑式生成检索树的步骤具体包括:The information retrieval method based on speech semantics according to claim 1, wherein the step of generating a retrieval tree based on the updated retrieval logic specifically comprises:
    识别检索逻辑式的检索类型;Identify the search type of search logic;
    当所述检索类型为单一三元组单一介质检索时,生成单一三元组单一介质检索树;When the retrieval type is a single-triple single-media retrieval, a single-triple single-media retrieval tree is generated;
    当所述检索类型为多三元组多介质检索时,生成多三元组多介质检索树。When the retrieval type is multi-triple multi-media retrieval, a multi-triple multi-media retrieval tree is generated.
  5. 根据权利要求1所述的基于语音语义的信息检索方法,其中,所述根据所述检索树对数据库进行信息检索,并展示检索到的答案信息的步骤具体包括:The method for information retrieval based on speech semantics according to claim 1, wherein the step of performing information retrieval on the database according to the retrieval tree and displaying the retrieved answer information specifically comprises:
    对所述检索树进行深度优先遍历,以确定与所述检索树对应的检索策略,并基于所述检索策略确定信息类型;Depth-first traversal of the search tree to determine the search strategy corresponding to the search tree, and determine the information type based on the search strategy;
    根据所述检索策略对数据库进行信息检索,得到答案信息;Perform information retrieval on the database according to the retrieval strategy to obtain answer information;
    依据所述信息类型对所述答案信息进行展示;Displaying the answer information according to the information type;
    将所述答案信息上传至区块链中。Upload the answer information to the blockchain.
  6. 根据权利要求5所述的基于语音语义的信息检索方法,其中,所述依据所述信息类型对所述答案信息进行展示的步骤具体包括:The method for information retrieval based on speech semantics according to claim 5, wherein the step of displaying the answer information according to the information type specifically comprises:
    当所述信息类型为单一实体单一属性或实体关系时,以文本展示所述答案信息;When the information type is a single entity, single attribute or entity relationship, display the answer information in text;
    当所述信息类型为单一实体多属性或多实体单一属性时,以柱状图展示所述答案信息;When the information type is a single entity with multiple attributes or multiple entities with a single attribute, the answer information is displayed in a histogram;
    当所述信息类型为属性变化趋势时,以折线图展示所述答案信息。When the information type is an attribute change trend, the answer information is displayed in a line graph.
  7. 根据权利要求1-6任意一项所述的基于语音语义的信息检索方法,其中,所述根据所述检索树对数据库进行信息检索,并展示检索到的答案信息的步骤之后,还包括:The method for information retrieval based on speech semantics according to any one of claims 1-6, wherein after the step of performing information retrieval on the database according to the retrieval tree and displaying the retrieved answer information, the method further comprises:
    将所述模板查询语句设置为库存查询语句以更新所述问句语料库;Setting the template query sentence as an inventory query sentence to update the question sentence corpus;
    将所述问句语料库中新添加的库存查询语句与所述更新后的检索逻辑式互相关联。The newly added inventory query sentence in the question sentence corpus is correlated with the updated search logic.
  8. 一种基于语音语义的信息检索装置,包括:An information retrieval device based on speech semantics, including:
    语句获取模块,用于获取输入的用户查询语句;The sentence acquisition module is used to acquire the input user query sentence;
    实体替换模块,解析所述用户查询语句,将所述用户查询语句中的实例实体替换为概念实体,得到模板查询语句;所述概念实体为所述实例实体所属的实体类型;An entity replacement module, which parses the user query statement, replaces the instance entity in the user query statement with a conceptual entity to obtain a template query statement; the conceptual entity is the entity type to which the instance entity belongs;
    相似度计算模块,用于计算所述模板查询语句与问句语料库中各库存查询语句的相似度;A similarity calculation module for calculating the similarity between the template query sentence and each inventory query sentence in the question corpus;
    语句确定模块,用于根据计算得到的相似度确定与所述模板查询语句匹配的库存查询语句,以及与所述库存查询语句对应的检索逻辑式;The sentence determination module is used to determine the inventory query sentence matching the template query sentence and the retrieval logic formula corresponding to the inventory query sentence according to the calculated similarity;
    逻辑式更新模块,用于根据所述实例实体对检索逻辑式进行更新;The logical update module is used to update the retrieval logical formula according to the instance entity;
    检索树生成模块,用于基于更新后的检索逻辑式生成检索树;The search tree generation module is used to generate the search tree based on the updated search logic formula;
    信息检索模块,用于根据所述检索树对数据库进行信息检索,并展示检索到的答案信息。The information retrieval module is used to perform information retrieval on the database according to the retrieval tree and display the retrieved answer information.
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory and a processor. The memory stores computer readable instructions. When the processor executes the computer readable instructions, the following steps are implemented:
    获取输入的用户查询语句;Get the input user query statement;
    解析所述用户查询语句,将所述用户查询语句中的实例实体替换为概念实体,得到模板查询语句;所述概念实体为所述实例实体所属的实体类型;Parse the user query sentence, replace the instance entity in the user query sentence with a conceptual entity to obtain a template query sentence; the conceptual entity is the entity type to which the instance entity belongs;
    计算所述模板查询语句与问句语料库中各库存查询语句的相似度;Calculating the similarity between the template query sentence and each inventory query sentence in the question corpus;
    根据计算得到的相似度确定与所述模板查询语句匹配的库存查询语句,以及与所述库存查询语句对应的检索逻辑式;Determine, according to the calculated similarity, an inventory query sentence matching the template query sentence, and a retrieval logic formula corresponding to the inventory query sentence;
    根据所述实例实体对检索逻辑式进行更新;Update the retrieval logic formula according to the instance entity;
    基于更新后的检索逻辑式生成检索树;Generate a search tree based on the updated search logic;
    根据所述检索树对数据库进行信息检索,并展示检索到的答案信息。Information retrieval is performed on the database according to the retrieval tree, and the retrieved answer information is displayed.
  10. 根据权利要求9所述的计算机设备,其中,所述解析所述用户查询语句,将所述用户查询语句中的实例实体替换为概念实体,得到模板查询语句的步骤具体包括:The computer device according to claim 9, wherein the step of parsing the user query sentence, replacing the instance entity in the user query sentence with a conceptual entity, and obtaining a template query sentence specifically comprises:
    识别所述用户查询语句中的实例实体,并通过语义识别确定所述实例实体的实体类型以得到表示所述实体类型的概念实体;Identifying the instance entity in the user query sentence, and determining the entity type of the instance entity through semantic recognition to obtain a conceptual entity representing the entity type;
    从标准实体列表中查询与所述实例实体所对应的标准实体;Query the standard entity corresponding to the instance entity from the standard entity list;
    将所述用户查询语句中的实例实体替换为所述概念实体得到模板查询语句,并将所述实例实体与所述标准实体关联存储。Replace the instance entity in the user query statement with the concept entity to obtain a template query statement, and store the instance entity in association with the standard entity.
  11. 根据权利要求10所述的计算机设备,其中,所述根据所述实例实体对检索逻辑式进行更新的步骤具体包括:The computer device according to claim 10, wherein the step of updating the search logic formula according to the instance entity specifically comprises:
    获取与所述实例实体关联存储的标准实体;Obtain a standard entity stored in association with the instance entity;
    将所述检索逻辑式中的标准实体替换为获取到的标准实体。Replace the standard entity in the search logic formula with the obtained standard entity.
  12. 根据权利要求9所述的计算机设备,其中,所述基于更新后的检索逻辑式生成检索树的步骤具体包括:The computer device according to claim 9, wherein the step of generating a search tree based on the updated search logic formula specifically comprises:
    识别检索逻辑式的检索类型;Identify the search type of search logic;
    当所述检索类型为单一三元组单一介质检索时,生成单一三元组单一介质检索树;When the retrieval type is a single-triple single-media retrieval, a single-triple single-media retrieval tree is generated;
    当所述检索类型为多三元组多介质检索时,生成多三元组多介质检索树。When the retrieval type is multi-triple multi-media retrieval, a multi-triple multi-media retrieval tree is generated.
  13. 根据权利要求9所述的计算机设备,其中,所述根据所述检索树对数据库进行信息检索,并展示检索到的答案信息的步骤具体包括:The computer device according to claim 9, wherein the step of performing information retrieval on the database according to the retrieval tree and displaying the retrieved answer information specifically comprises:
    对所述检索树进行深度优先遍历,以确定与所述检索树对应的检索策略,并基于所述检索策略确定信息类型;Depth-first traversal of the search tree to determine the search strategy corresponding to the search tree, and determine the information type based on the search strategy;
    根据所述检索策略对数据库进行信息检索,得到答案信息;Perform information retrieval on the database according to the retrieval strategy to obtain answer information;
    依据所述信息类型对所述答案信息进行展示;Displaying the answer information according to the information type;
    将所述答案信息上传至区块链中。Upload the answer information to the blockchain.
  14. 根据权利要求13所述的计算机设备,其中,所述依据所述信息类型对所述答案 信息进行展示的步骤具体包括:The computer device according to claim 13, wherein the step of displaying the answer information according to the information type specifically comprises:
    当所述信息类型为单一实体单一属性或实体关系时,以文本展示所述答案信息;When the information type is a single entity, single attribute or entity relationship, display the answer information in text;
    当所述信息类型为单一实体多属性或多实体单一属性时,以柱状图展示所述答案信息;When the information type is a single entity with multiple attributes or multiple entities with a single attribute, the answer information is displayed in a histogram;
    当所述信息类型为属性变化趋势时,以折线图展示所述答案信息。When the information type is an attribute change trend, the answer information is displayed in a line graph.
  15. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令;其中,所述计算机可读指令被处理器执行时实现如下步骤:A computer-readable storage medium on which computer-readable instructions are stored; wherein, when the computer-readable instructions are executed by a processor, the following steps are implemented:
    获取输入的用户查询语句;Get the input user query statement;
    解析所述用户查询语句,将所述用户查询语句中的实例实体替换为概念实体,得到模板查询语句;所述概念实体为所述实例实体所属的实体类型;Parse the user query sentence, replace the instance entity in the user query sentence with a conceptual entity to obtain a template query sentence; the conceptual entity is the entity type to which the instance entity belongs;
    计算所述模板查询语句与问句语料库中各库存查询语句的相似度;Calculating the similarity between the template query sentence and each inventory query sentence in the question corpus;
    根据计算得到的相似度确定与所述模板查询语句匹配的库存查询语句,以及与所述库存查询语句对应的检索逻辑式;Determine, according to the calculated similarity, an inventory query sentence matching the template query sentence, and a retrieval logic formula corresponding to the inventory query sentence;
    根据所述实例实体对检索逻辑式进行更新;Update the retrieval logic formula according to the instance entity;
    基于更新后的检索逻辑式生成检索树;Generate a search tree based on the updated search logic;
    根据所述检索树对数据库进行信息检索,并展示检索到的答案信息。Information retrieval is performed on the database according to the retrieval tree, and the retrieved answer information is displayed.
  16. 根据权利要求15所述的一种计算机可读存储介质,其中,所述解析所述用户查询语句,将所述用户查询语句中的实例实体替换为概念实体,得到模板查询语句的步骤具体包括:The computer-readable storage medium according to claim 15, wherein the step of parsing the user query sentence, replacing the instance entity in the user query sentence with a conceptual entity, and obtaining a template query sentence specifically comprises:
    识别所述用户查询语句中的实例实体,并通过语义识别确定所述实例实体的实体类型以得到表示所述实体类型的概念实体;Identifying the instance entity in the user query sentence, and determining the entity type of the instance entity through semantic recognition to obtain a conceptual entity representing the entity type;
    从标准实体列表中查询与所述实例实体所对应的标准实体;Query the standard entity corresponding to the instance entity from the standard entity list;
    将所述用户查询语句中的实例实体替换为所述概念实体得到模板查询语句,并将所述实例实体与所述标准实体关联存储。Replace the instance entity in the user query statement with the concept entity to obtain a template query statement, and store the instance entity in association with the standard entity.
  17. 根据权利要求16所述的一种计算机可读存储介质,其中,所述根据所述实例实体对检索逻辑式进行更新的步骤具体包括:The computer-readable storage medium according to claim 16, wherein the step of updating the retrieval logic formula according to the instance entity specifically comprises:
    获取与所述实例实体关联存储的标准实体;Obtain a standard entity stored in association with the instance entity;
    将所述检索逻辑式中的标准实体替换为获取到的标准实体。Replace the standard entity in the search logic formula with the obtained standard entity.
  18. 根据权利要求15所述的一种计算机可读存储介质,其中,所述基于更新后的检索逻辑式生成检索树的步骤具体包括:The computer-readable storage medium according to claim 15, wherein the step of generating a search tree based on the updated search logic formula specifically comprises:
    识别检索逻辑式的检索类型;Identify the search type of search logic;
    当所述检索类型为单一三元组单一介质检索时,生成单一三元组单一介质检索树;When the retrieval type is a single-triple single-media retrieval, a single-triple single-media retrieval tree is generated;
    当所述检索类型为多三元组多介质检索时,生成多三元组多介质检索树。When the retrieval type is multi-triple multi-media retrieval, a multi-triple multi-media retrieval tree is generated.
  19. 根据权利要求15所述的一种计算机可读存储介质,其中,所述根据所述检索树对数据库进行信息检索,并展示检索到的答案信息的步骤具体包括:The computer-readable storage medium according to claim 15, wherein the step of performing information retrieval on the database according to the retrieval tree and displaying the retrieved answer information specifically comprises:
    对所述检索树进行深度优先遍历,以确定与所述检索树对应的检索策略,并基于所述检索策略确定信息类型;Depth-first traversal of the search tree to determine the search strategy corresponding to the search tree, and determine the information type based on the search strategy;
    根据所述检索策略对数据库进行信息检索,得到答案信息;Perform information retrieval on the database according to the retrieval strategy to obtain answer information;
    依据所述信息类型对所述答案信息进行展示;Displaying the answer information according to the information type;
    将所述答案信息上传至区块链中。Upload the answer information to the blockchain.
  20. 根据权利要求19所述的一种计算机可读存储介质,其中,所述依据所述信息类型对所述答案信息进行展示的步骤具体包括:A computer-readable storage medium according to claim 19, wherein the step of displaying the answer information according to the information type specifically comprises:
    当所述信息类型为单一实体单一属性或实体关系时,以文本展示所述答案信息;When the information type is a single entity, single attribute or entity relationship, display the answer information in text;
    当所述信息类型为单一实体多属性或多实体单一属性时,以柱状图展示所述答案信息;When the information type is a single entity with multiple attributes or multiple entities with a single attribute, the answer information is displayed in a histogram;
    当所述信息类型为属性变化趋势时,以折线图展示所述答案信息。When the information type is an attribute change trend, the answer information is displayed in a line graph.
PCT/CN2020/117387 2020-05-22 2020-09-24 Speech semantics-based information search method and related device WO2021135439A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010440491.7 2020-05-22
CN202010440491.7A CN111782763A (en) 2020-05-22 2020-05-22 Information retrieval method based on voice semantics and related equipment thereof

Publications (1)

Publication Number Publication Date
WO2021135439A1 true WO2021135439A1 (en) 2021-07-08

Family

ID=72753790

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117387 WO2021135439A1 (en) 2020-05-22 2020-09-24 Speech semantics-based information search method and related device

Country Status (2)

Country Link
CN (1) CN111782763A (en)
WO (1) WO2021135439A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782763A (en) * 2020-05-22 2020-10-16 平安科技(深圳)有限公司 Information retrieval method based on voice semantics and related equipment thereof
CN112287069B (en) * 2020-10-29 2023-07-25 平安科技(深圳)有限公司 Information retrieval method and device based on voice semantics and computer equipment
CN112463432A (en) * 2020-12-08 2021-03-09 广州品唯软件有限公司 Inspection method, device and system based on index data
CN112527997B (en) * 2020-12-18 2024-01-23 中国南方电网有限责任公司 Intelligent question-answering method and system based on power grid field scheduling scene knowledge graph
CN112613176A (en) * 2020-12-23 2021-04-06 贝壳技术有限公司 Slow SQL statement prediction method and system
CN114860894A (en) * 2021-01-20 2022-08-05 京东科技控股股份有限公司 Method and device for querying knowledge base, computer equipment and storage medium
CN113535919B (en) * 2021-07-16 2022-11-08 北京元年科技股份有限公司 Data query method and device, computer equipment and storage medium
CN117520483A (en) * 2024-01-04 2024-02-06 北京奇虎科技有限公司 Information verification method and device based on large model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130036107A1 (en) * 2011-08-07 2013-02-07 Citizennet Inc. Systems and methods for trend detection using frequency analysis
CN107885874A (en) * 2017-11-28 2018-04-06 上海智臻智能网络科技股份有限公司 Data query method and apparatus, computer equipment and computer-readable recording medium
CN108170859A (en) * 2018-01-22 2018-06-15 北京百度网讯科技有限公司 Method, apparatus, storage medium and the terminal device of speech polling
CN111126073A (en) * 2019-12-23 2020-05-08 中国建设银行股份有限公司 Semantic retrieval method and device
CN111782763A (en) * 2020-05-22 2020-10-16 平安科技(深圳)有限公司 Information retrieval method based on voice semantics and related equipment thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130036107A1 (en) * 2011-08-07 2013-02-07 Citizennet Inc. Systems and methods for trend detection using frequency analysis
CN107885874A (en) * 2017-11-28 2018-04-06 上海智臻智能网络科技股份有限公司 Data query method and apparatus, computer equipment and computer-readable recording medium
CN108170859A (en) * 2018-01-22 2018-06-15 北京百度网讯科技有限公司 Method, apparatus, storage medium and the terminal device of speech polling
CN111126073A (en) * 2019-12-23 2020-05-08 中国建设银行股份有限公司 Semantic retrieval method and device
CN111782763A (en) * 2020-05-22 2020-10-16 平安科技(深圳)有限公司 Information retrieval method based on voice semantics and related equipment thereof

Also Published As

Publication number Publication date
CN111782763A (en) 2020-10-16

Similar Documents

Publication Publication Date Title
WO2021135439A1 (en) Speech semantics-based information search method and related device
WO2020077896A1 (en) Method and apparatus for generating question data, computer device, and storage medium
US11392775B2 (en) Semantic recognition method, electronic device, and computer-readable storage medium
WO2021135455A1 (en) Semantic recall method, apparatus, computer device, and storage medium
US10706045B1 (en) Natural language querying of a data lake using contextualized knowledge bases
WO2023134057A1 (en) Affair information query method and apparatus, and computer device and storage medium
WO2023040493A1 (en) Event detection
US9043321B2 (en) Enhancing cluster analysis using document metadata
WO2021063089A1 (en) Rule matching method, rule matching apparatus, storage medium and electronic device
US9720895B1 (en) Device for construction of computable linked semantic annotations
CN110619050A (en) Intention recognition method and equipment
CN111553556A (en) Business data analysis method and device, computer equipment and storage medium
CN110929134A (en) Investment and financing data management method and device, computer equipment and storage medium
TW202001621A (en) Corpus generating method and apparatus, and human-machine interaction processing method and apparatus
US9471581B1 (en) Autocompletion of filename based on text in a file to be saved
US20230385291A1 (en) Semantic entity search using vector space
US8862609B2 (en) Expanding high level queries
WO2022073341A1 (en) Disease entity matching method and apparatus based on voice semantics, and computer device
CN116383412B (en) Functional point amplification method and system based on knowledge graph
CN111126073B (en) Semantic retrieval method and device
CN117149804A (en) Data processing method, device, electronic equipment and storage medium
CN110019714A (en) More intent query method, apparatus, equipment and storage medium based on historical results
CN117076636A (en) Information query method, system and equipment for intelligent customer service
CN113434789B (en) Search sorting method based on multi-dimensional text features and related equipment
CN115510247A (en) Method, device, equipment and storage medium for constructing electric carbon policy knowledge graph

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20908574

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20908574

Country of ref document: EP

Kind code of ref document: A1