CN111061829A

CN111061829A - Tree type retrieval method and device

Info

Publication number: CN111061829A
Application number: CN201911293832.6A
Authority: CN
Inventors: 淡宏斌; 邵明雪; 熊小安; 陈方正
Original assignee: Beijing Softcom Smart City Technology Co Ltd
Current assignee: Beijing Softcom Smart City Technology Co Ltd
Priority date: 2019-12-16
Filing date: 2019-12-16
Publication date: 2020-04-24

Abstract

The invention provides a tree type retrieval method, which comprises the following steps: forming a tree model according to the information of the vocabulary; and matching the characters of the text to be analyzed with the tree model so as to obtain a text result of the corresponding vocabulary. The tree type retrieval method and the tree type retrieval device can improve the retrieval efficiency.

Description

Tree type retrieval method and device

Technical Field

The invention relates to the technical field of retrieval, in particular to a tree type retrieval method and a tree type retrieval device.

Background

In the information-oriented society developing at a high speed, how to quickly and accurately search information is particularly important, and is a difficult point to be solved for public opinion analysis, information search and the like. For example, for emotion analysis of an article, word bank search needs to be performed on a target article based on a known word bank, and finally, information of the word bank is accurately matched to the article.

In the prior art, articles are retrieved for multiple times in a circulating manner based on a word stock, so that excessive consumption is caused in a time dimension invisibly, and a satisfactory result cannot be finally achieved.

Disclosure of Invention

The invention aims to provide a tree type retrieval method and a tree type retrieval device, which can improve the retrieval efficiency.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, the present invention provides a tree search method, including:

forming a tree model according to the information of the vocabulary;

and matching the characters of the text to be analyzed with the tree model so as to obtain a text result corresponding to the vocabulary.

Further, forming the tree model according to the vocabulary information includes:

dividing the vocabulary into characters and endowing each character with a corresponding child node;

giving state information corresponding to the child nodes according to the relation between different vocabularies and the number of words contained in the vocabularies;

and giving corresponding word meanings to the child nodes according to the information of the words and the state information.

Further, the lexical meaning includes tag information and/or sentiment information.

Further, the status information is continuous, stop or longest.

Further, matching the characters of the text to be analyzed with the tree model to obtain a text result corresponding to the vocabulary, including:

segmenting the text to be analyzed into characters;

matching characters to nodes corresponding to the tree model according to the character sequence of the text to be analyzed from child nodes of the root node of the tree model;

if the state information of the node corresponding to the current character is continuous or stopped, the next character starts to be matched from the child node of the node corresponding to the current character;

and if the state information of the node corresponding to the current character is the longest, updating the current matching result to the text result, and matching the next character from the child node of the root node of the tree model again.

Further, if the state information of the node corresponding to the current character is continuous or stopped, the method further includes:

if the current character is not matched with the corresponding node, recursion is carried out to the next character, and matching is started from the child node of the root node again.

Further, if the state information of the node corresponding to the current character is continuous or stopped, the next character starts to be matched from the child node of the node corresponding to the current character, including:

if the state information of the node corresponding to the current character is continuous, the next character is matched from the child node of the node corresponding to the current character;

or if the state information of the node corresponding to the current character is stop, adding the current matching result to the text result, and starting matching of the next character from the child node of the node corresponding to the current character.

In a second aspect, the present invention provides a tree search apparatus comprising:

the model building module is used for forming a tree model according to the information of the vocabulary;

and the matching module is used for matching the characters of the text to be analyzed with the tree model so as to obtain a text result corresponding to the vocabulary.

Further, the model building module is specifically used for dividing the vocabulary into characters and giving each character a corresponding child node;

Further, the matching module is specifically configured to segment the text to be analyzed into characters;

matching characters to child nodes corresponding to the tree model according to the character sequence of the text to be analyzed from child nodes of the root node of the tree model;

The invention has the beneficial effects that:

according to the invention, a tree model is formed according to the vocabulary, so that the information can be efficiently and accurately matched to complete information retrieval; and the characters of the text to be analyzed are matched with the tree model to obtain a text result, so that the text and all vocabularies can be matched after one-time retrieval, corresponding vocabulary information is obtained, and the matching time is shortened.

Drawings

Fig. 1 is a schematic flowchart of a tree-based search method according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a tree search apparatus according to a second embodiment of the present invention.

Detailed Description

In order to make the technical problems solved, technical solutions adopted and technical effects achieved by the present invention clearer, the technical solutions of the embodiments of the present invention will be described in further detail below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments.

Example one

The embodiment provides a tree-type search method, which matches a text with all vocabularies by searching once, thereby obtaining corresponding vocabulary information and reducing the matching time.

Fig. 1 is a schematic flowchart of a tree-based search method according to an embodiment of the present invention. As shown in fig. 1, the method specifically includes the following steps:

and S11, forming a tree model according to the vocabulary information.

Specifically, the words are first segmented into characters, and corresponding child nodes are assigned to each character. When the characters belong to the same vocabulary, setting the first character as the child node of the root node, setting the second character as the child node of the node corresponding to the first character, setting the third character as the child node of the node corresponding to the second character, and so on. An example is illustrated: vocabulary: china, the first character: "middle" is set as the child node of the root node; and the second character: "Country" is set as the child node of the "middle" corresponding node. Vocabulary: in our country, the first character: "ancestor" is set as a child of the root node; "Country" is set as the child node of the "ancestor" corresponding node.

One word comprises another word, the same characters share the same node, and different characters are set as child nodes of the node corresponding to the previous character, for example: the vocabulary includes: china and Chinese, the characters with the same vocabulary are 'China', 'China' shares the same node, and 'nation' shares the same node and different characters: "person" is set as the child node of the corresponding node of the last character "nation".

And then, giving the state information corresponding to the child nodes according to the relation between different vocabularies and the number of words contained in the vocabularies. Wherein the state information is continuous, stop or longest.

An example is illustrated: vocabulary: china, which contains 2 characters, the state information of the corresponding node of 'middle' of China is continuous, and the state information of the corresponding node of 'state' is longest; when the word "Chinese" appears, because the word contains another word "China", the state information of the node of the original "country" is stopped, and the state information of the node of the "person" is the longest.

And then endowing corresponding child node vocabulary meanings according to the vocabulary information and the state information. The lexical meaning includes tag information and/or sentiment information.

S12, matching the characters of the text to be analyzed with the tree model, so as to obtain the text result corresponding to the vocabulary.

Specifically, the text to be analyzed is first segmented into characters. The characters are taken as matching units.

And then matching characters to the child nodes corresponding to the tree model according to the character sequence of the text to be analyzed from the child nodes of the root node of the tree model.

Preferably, the method further comprises: if the current character is not matched with the corresponding node, recursion is carried out to the next character, and matching is started from the child node of the root node again. Characters which do not contain the vocabulary meaning can be excluded, and inaccurate text results are avoided.

And if the state information of the node corresponding to the current character is continuous or stopped, the next character starts to be matched from the child node of the node corresponding to the current character.

Preferably, if the state information of the node corresponding to the current character is continuous, the next character is matched from the child node of the node corresponding to the current character.

An example is illustrated: the text to be analyzed is "I am a Chinese and I love my country. "segment the text into characters," start matching the first character "i" segmented from the root node of the tree model, but not match to the corresponding node, recurse to the next character "yes", still do not match to the corresponding node, recurse to the next character, "obtain the state information as continuing when matching to the corresponding node, then recurse to the next character" nation ", and start matching the child node of the corresponding node in the last character" in ", obtain the state information as stopping when matching to the corresponding node, then add the current matching result" china "to the text result. And because the state information of the node is stopped, the matching is continued from the child node of the corresponding node of the last character "state" until the next character "person" recurses, the state information is acquired to be the longest when the corresponding node is matched, and then the current matching result "Chinese" is updated to the text result.

Then recurse to the next character "me" and start matching again starting from the child nodes of the root node of the tree model. Recursion is carried out until the last character matching is completed.

According to the embodiment, a tree model is formed according to the vocabulary, so that the information can be efficiently and accurately matched, and the information retrieval is completed; and the characters of the text to be analyzed are matched with the tree model to obtain a text result, so that the text and all vocabularies can be matched after one-time retrieval, corresponding vocabulary information is obtained, and the matching time is shortened.

Example two

The present embodiment provides a tree search apparatus, which is used for implementing the tree search method described in the foregoing embodiment, to solve the same technical problems and achieve the same technical effects. Fig. 2 is a schematic structural diagram of a tree search apparatus according to a second embodiment of the present invention. As shown in fig. 2, the apparatus includes:

and the model building module 10 is used for forming a tree model according to the information of the vocabulary.

Further, the method is specifically used for firstly segmenting the words into characters and endowing each character with a corresponding child node; then, state information corresponding to the child nodes is given according to the relation between different vocabularies and the word number contained in the vocabularies; and then endowing corresponding child node vocabulary meanings according to the vocabulary information and the state information.

And the matching module 20 is used for matching the characters of the text to be analyzed with the tree model so as to obtain a text result corresponding to the vocabulary.

Further, the method is specifically configured to segment the text to be analyzed into characters; matching characters to child nodes corresponding to the tree model according to the character sequence of the text to be analyzed from child nodes of the root node of the tree model;

The embodiment provides a basis for efficiently and accurately matching information through the model building module; and through the matching of the model building module and the matching module, the matching of the text and all vocabularies is realized by one-time retrieval, so that the corresponding vocabulary information is obtained, and the matching time is shortened.

The technical principle of the present invention is described above in connection with specific embodiments. The description is made for the purpose of illustrating the principles of the invention and should not be construed in any way as limiting the scope of the invention. Based on the explanations herein, those skilled in the art will be able to conceive of other embodiments of the present invention without inventive effort, which would fall within the scope of the present invention.

Claims

1. A tree search method, comprising:

forming a tree model according to the information of the vocabulary;

2. The tree search method of claim 1 wherein forming a tree model based on lexical information comprises:

3. The tree search method according to claim 2, wherein: the lexical meaning includes tag information and/or sentiment information.

4. The tree search method according to claim 2, wherein: the status information is continuous, stopped, or longest.

5. The tree search method of claim 4, wherein matching characters of the text to be analyzed with the tree model to obtain the text result corresponding to the vocabulary comprises:

segmenting the text to be analyzed into characters;

6. The tree search method of claim 5, wherein if the state information of the node corresponding to the current character is continuous or stopped, the method further comprises:

7. The tree search method of claim 5, wherein if the state information of the node corresponding to the current character is continuous or stopped, the next character starts to match from the child node of the node corresponding to the current character, comprising:

8. A tree search device, comprising:

9. The tree-based search device according to claim 8, wherein:

the model building module is specifically used for segmenting the vocabulary into characters and endowing each character with a corresponding child node;

10. The tree-based search device according to claim 8, wherein:

the matching module is specifically used for segmenting the text to be analyzed into characters;