CN111061829A - Tree type retrieval method and device - Google Patents

Tree type retrieval method and device Download PDF

Info

Publication number
CN111061829A
CN111061829A CN201911293832.6A CN201911293832A CN111061829A CN 111061829 A CN111061829 A CN 111061829A CN 201911293832 A CN201911293832 A CN 201911293832A CN 111061829 A CN111061829 A CN 111061829A
Authority
CN
China
Prior art keywords
node
character
matching
text
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911293832.6A
Other languages
Chinese (zh)
Inventor
淡宏斌
邵明雪
熊小安
陈方正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Softcom Smart City Technology Co Ltd
Original Assignee
Beijing Softcom Smart City Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Softcom Smart City Technology Co Ltd filed Critical Beijing Softcom Smart City Technology Co Ltd
Priority to CN201911293832.6A priority Critical patent/CN111061829A/en
Publication of CN111061829A publication Critical patent/CN111061829A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a tree type retrieval method, which comprises the following steps: forming a tree model according to the information of the vocabulary; and matching the characters of the text to be analyzed with the tree model so as to obtain a text result of the corresponding vocabulary. The tree type retrieval method and the tree type retrieval device can improve the retrieval efficiency.

Description

Tree type retrieval method and device
Technical Field
The invention relates to the technical field of retrieval, in particular to a tree type retrieval method and a tree type retrieval device.
Background
In the information-oriented society developing at a high speed, how to quickly and accurately search information is particularly important, and is a difficult point to be solved for public opinion analysis, information search and the like. For example, for emotion analysis of an article, word bank search needs to be performed on a target article based on a known word bank, and finally, information of the word bank is accurately matched to the article.
In the prior art, articles are retrieved for multiple times in a circulating manner based on a word stock, so that excessive consumption is caused in a time dimension invisibly, and a satisfactory result cannot be finally achieved.
Disclosure of Invention
The invention aims to provide a tree type retrieval method and a tree type retrieval device, which can improve the retrieval efficiency.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a tree search method, including:
forming a tree model according to the information of the vocabulary;
and matching the characters of the text to be analyzed with the tree model so as to obtain a text result corresponding to the vocabulary.
Further, forming the tree model according to the vocabulary information includes:
dividing the vocabulary into characters and endowing each character with a corresponding child node;
giving state information corresponding to the child nodes according to the relation between different vocabularies and the number of words contained in the vocabularies;
and giving corresponding word meanings to the child nodes according to the information of the words and the state information.
Further, the lexical meaning includes tag information and/or sentiment information.
Further, the status information is continuous, stop or longest.
Further, matching the characters of the text to be analyzed with the tree model to obtain a text result corresponding to the vocabulary, including:
segmenting the text to be analyzed into characters;
matching characters to nodes corresponding to the tree model according to the character sequence of the text to be analyzed from child nodes of the root node of the tree model;
if the state information of the node corresponding to the current character is continuous or stopped, the next character starts to be matched from the child node of the node corresponding to the current character;
and if the state information of the node corresponding to the current character is the longest, updating the current matching result to the text result, and matching the next character from the child node of the root node of the tree model again.
Further, if the state information of the node corresponding to the current character is continuous or stopped, the method further includes:
if the current character is not matched with the corresponding node, recursion is carried out to the next character, and matching is started from the child node of the root node again.
Further, if the state information of the node corresponding to the current character is continuous or stopped, the next character starts to be matched from the child node of the node corresponding to the current character, including:
if the state information of the node corresponding to the current character is continuous, the next character is matched from the child node of the node corresponding to the current character;
or if the state information of the node corresponding to the current character is stop, adding the current matching result to the text result, and starting matching of the next character from the child node of the node corresponding to the current character.
In a second aspect, the present invention provides a tree search apparatus comprising:
the model building module is used for forming a tree model according to the information of the vocabulary;
and the matching module is used for matching the characters of the text to be analyzed with the tree model so as to obtain a text result corresponding to the vocabulary.
Further, the model building module is specifically used for dividing the vocabulary into characters and giving each character a corresponding child node;
giving state information corresponding to the child nodes according to the relation between different vocabularies and the number of words contained in the vocabularies;
and giving corresponding word meanings to the child nodes according to the information of the words and the state information.
Further, the matching module is specifically configured to segment the text to be analyzed into characters;
matching characters to child nodes corresponding to the tree model according to the character sequence of the text to be analyzed from child nodes of the root node of the tree model;
if the state information of the node corresponding to the current character is continuous or stopped, the next character starts to be matched from the child node of the node corresponding to the current character;
and if the state information of the node corresponding to the current character is the longest, updating the current matching result to the text result, and matching the next character from the child node of the root node of the tree model again.
The invention has the beneficial effects that:
according to the invention, a tree model is formed according to the vocabulary, so that the information can be efficiently and accurately matched to complete information retrieval; and the characters of the text to be analyzed are matched with the tree model to obtain a text result, so that the text and all vocabularies can be matched after one-time retrieval, corresponding vocabulary information is obtained, and the matching time is shortened.
Drawings
Fig. 1 is a schematic flowchart of a tree-based search method according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a tree search apparatus according to a second embodiment of the present invention.
Detailed Description
In order to make the technical problems solved, technical solutions adopted and technical effects achieved by the present invention clearer, the technical solutions of the embodiments of the present invention will be described in further detail below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments.
Example one
The embodiment provides a tree-type search method, which matches a text with all vocabularies by searching once, thereby obtaining corresponding vocabulary information and reducing the matching time.
Fig. 1 is a schematic flowchart of a tree-based search method according to an embodiment of the present invention. As shown in fig. 1, the method specifically includes the following steps:
and S11, forming a tree model according to the vocabulary information.
Specifically, the words are first segmented into characters, and corresponding child nodes are assigned to each character. When the characters belong to the same vocabulary, setting the first character as the child node of the root node, setting the second character as the child node of the node corresponding to the first character, setting the third character as the child node of the node corresponding to the second character, and so on. An example is illustrated: vocabulary: china, the first character: "middle" is set as the child node of the root node; and the second character: "Country" is set as the child node of the "middle" corresponding node. Vocabulary: in our country, the first character: "ancestor" is set as a child of the root node; "Country" is set as the child node of the "ancestor" corresponding node.
One word comprises another word, the same characters share the same node, and different characters are set as child nodes of the node corresponding to the previous character, for example: the vocabulary includes: china and Chinese, the characters with the same vocabulary are 'China', 'China' shares the same node, and 'nation' shares the same node and different characters: "person" is set as the child node of the corresponding node of the last character "nation".
And then, giving the state information corresponding to the child nodes according to the relation between different vocabularies and the number of words contained in the vocabularies. Wherein the state information is continuous, stop or longest.
An example is illustrated: vocabulary: china, which contains 2 characters, the state information of the corresponding node of 'middle' of China is continuous, and the state information of the corresponding node of 'state' is longest; when the word "Chinese" appears, because the word contains another word "China", the state information of the node of the original "country" is stopped, and the state information of the node of the "person" is the longest.
And then endowing corresponding child node vocabulary meanings according to the vocabulary information and the state information. The lexical meaning includes tag information and/or sentiment information.
S12, matching the characters of the text to be analyzed with the tree model, so as to obtain the text result corresponding to the vocabulary.
Specifically, the text to be analyzed is first segmented into characters. The characters are taken as matching units.
And then matching characters to the child nodes corresponding to the tree model according to the character sequence of the text to be analyzed from the child nodes of the root node of the tree model.
Preferably, the method further comprises: if the current character is not matched with the corresponding node, recursion is carried out to the next character, and matching is started from the child node of the root node again. Characters which do not contain the vocabulary meaning can be excluded, and inaccurate text results are avoided.
And if the state information of the node corresponding to the current character is continuous or stopped, the next character starts to be matched from the child node of the node corresponding to the current character.
Preferably, if the state information of the node corresponding to the current character is continuous, the next character is matched from the child node of the node corresponding to the current character.
Or if the state information of the node corresponding to the current character is stop, adding the current matching result to the text result, and starting matching of the next character from the child node of the node corresponding to the current character.
And if the state information of the node corresponding to the current character is the longest, updating the current matching result to the text result, and matching the next character from the child node of the root node of the tree model again.
An example is illustrated: the text to be analyzed is "I am a Chinese and I love my country. "segment the text into characters," start matching the first character "i" segmented from the root node of the tree model, but not match to the corresponding node, recurse to the next character "yes", still do not match to the corresponding node, recurse to the next character, "obtain the state information as continuing when matching to the corresponding node, then recurse to the next character" nation ", and start matching the child node of the corresponding node in the last character" in ", obtain the state information as stopping when matching to the corresponding node, then add the current matching result" china "to the text result. And because the state information of the node is stopped, the matching is continued from the child node of the corresponding node of the last character "state" until the next character "person" recurses, the state information is acquired to be the longest when the corresponding node is matched, and then the current matching result "Chinese" is updated to the text result.
Then recurse to the next character "me" and start matching again starting from the child nodes of the root node of the tree model. Recursion is carried out until the last character matching is completed.
According to the embodiment, a tree model is formed according to the vocabulary, so that the information can be efficiently and accurately matched, and the information retrieval is completed; and the characters of the text to be analyzed are matched with the tree model to obtain a text result, so that the text and all vocabularies can be matched after one-time retrieval, corresponding vocabulary information is obtained, and the matching time is shortened.
Example two
The present embodiment provides a tree search apparatus, which is used for implementing the tree search method described in the foregoing embodiment, to solve the same technical problems and achieve the same technical effects. Fig. 2 is a schematic structural diagram of a tree search apparatus according to a second embodiment of the present invention. As shown in fig. 2, the apparatus includes:
and the model building module 10 is used for forming a tree model according to the information of the vocabulary.
Further, the method is specifically used for firstly segmenting the words into characters and endowing each character with a corresponding child node; then, state information corresponding to the child nodes is given according to the relation between different vocabularies and the word number contained in the vocabularies; and then endowing corresponding child node vocabulary meanings according to the vocabulary information and the state information.
And the matching module 20 is used for matching the characters of the text to be analyzed with the tree model so as to obtain a text result corresponding to the vocabulary.
Further, the method is specifically configured to segment the text to be analyzed into characters; matching characters to child nodes corresponding to the tree model according to the character sequence of the text to be analyzed from child nodes of the root node of the tree model;
if the state information of the node corresponding to the current character is continuous or stopped, the next character starts to be matched from the child node of the node corresponding to the current character;
and if the state information of the node corresponding to the current character is the longest, updating the current matching result to the text result, and matching the next character from the child node of the root node of the tree model again.
The embodiment provides a basis for efficiently and accurately matching information through the model building module; and through the matching of the model building module and the matching module, the matching of the text and all vocabularies is realized by one-time retrieval, so that the corresponding vocabulary information is obtained, and the matching time is shortened.
The technical principle of the present invention is described above in connection with specific embodiments. The description is made for the purpose of illustrating the principles of the invention and should not be construed in any way as limiting the scope of the invention. Based on the explanations herein, those skilled in the art will be able to conceive of other embodiments of the present invention without inventive effort, which would fall within the scope of the present invention.

Claims (10)

1. A tree search method, comprising:
forming a tree model according to the information of the vocabulary;
and matching the characters of the text to be analyzed with the tree model so as to obtain a text result corresponding to the vocabulary.
2. The tree search method of claim 1 wherein forming a tree model based on lexical information comprises:
dividing the vocabulary into characters and endowing each character with a corresponding child node;
giving state information corresponding to the child nodes according to the relation between different vocabularies and the number of words contained in the vocabularies;
and giving corresponding word meanings to the child nodes according to the information of the words and the state information.
3. The tree search method according to claim 2, wherein: the lexical meaning includes tag information and/or sentiment information.
4. The tree search method according to claim 2, wherein: the status information is continuous, stopped, or longest.
5. The tree search method of claim 4, wherein matching characters of the text to be analyzed with the tree model to obtain the text result corresponding to the vocabulary comprises:
segmenting the text to be analyzed into characters;
matching characters to nodes corresponding to the tree model according to the character sequence of the text to be analyzed from child nodes of the root node of the tree model;
if the state information of the node corresponding to the current character is continuous or stopped, the next character starts to be matched from the child node of the node corresponding to the current character;
and if the state information of the node corresponding to the current character is the longest, updating the current matching result to the text result, and matching the next character from the child node of the root node of the tree model again.
6. The tree search method of claim 5, wherein if the state information of the node corresponding to the current character is continuous or stopped, the method further comprises:
if the current character is not matched with the corresponding node, recursion is carried out to the next character, and matching is started from the child node of the root node again.
7. The tree search method of claim 5, wherein if the state information of the node corresponding to the current character is continuous or stopped, the next character starts to match from the child node of the node corresponding to the current character, comprising:
if the state information of the node corresponding to the current character is continuous, the next character is matched from the child node of the node corresponding to the current character;
or if the state information of the node corresponding to the current character is stop, adding the current matching result to the text result, and starting matching of the next character from the child node of the node corresponding to the current character.
8. A tree search device, comprising:
the model building module is used for forming a tree model according to the information of the vocabulary;
and the matching module is used for matching the characters of the text to be analyzed with the tree model so as to obtain a text result corresponding to the vocabulary.
9. The tree-based search device according to claim 8, wherein:
the model building module is specifically used for segmenting the vocabulary into characters and endowing each character with a corresponding child node;
giving state information corresponding to the child nodes according to the relation between different vocabularies and the number of words contained in the vocabularies;
and giving corresponding word meanings to the child nodes according to the information of the words and the state information.
10. The tree-based search device according to claim 8, wherein:
the matching module is specifically used for segmenting the text to be analyzed into characters;
matching characters to child nodes corresponding to the tree model according to the character sequence of the text to be analyzed from child nodes of the root node of the tree model;
if the state information of the node corresponding to the current character is continuous or stopped, the next character starts to be matched from the child node of the node corresponding to the current character;
and if the state information of the node corresponding to the current character is the longest, updating the current matching result to the text result, and matching the next character from the child node of the root node of the tree model again.
CN201911293832.6A 2019-12-16 2019-12-16 Tree type retrieval method and device Pending CN111061829A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911293832.6A CN111061829A (en) 2019-12-16 2019-12-16 Tree type retrieval method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911293832.6A CN111061829A (en) 2019-12-16 2019-12-16 Tree type retrieval method and device

Publications (1)

Publication Number Publication Date
CN111061829A true CN111061829A (en) 2020-04-24

Family

ID=70300736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911293832.6A Pending CN111061829A (en) 2019-12-16 2019-12-16 Tree type retrieval method and device

Country Status (1)

Country Link
CN (1) CN111061829A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898381A (en) * 2020-06-30 2020-11-06 北京来也网络科技有限公司 Text information extraction method, device, equipment and medium combining RPA and AI

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682017A (en) * 2011-03-15 2012-09-19 阿里巴巴集团控股有限公司 Information retrieval method and system
CN103377259A (en) * 2012-04-28 2013-10-30 北京新媒传信科技有限公司 Multiple-mode-string matching method and device
CN103383699A (en) * 2013-06-28 2013-11-06 安徽科大讯飞信息科技股份有限公司 Character string retrieval method and system
CN105183788A (en) * 2015-08-20 2015-12-23 及时标讯网络信息技术(北京)有限公司 Operation method for Chinese AC automatic machine based on retrieval of keyword dictionary tree
US10230639B1 (en) * 2017-08-08 2019-03-12 Innovium, Inc. Enhanced prefix matching
CN109918664A (en) * 2019-03-05 2019-06-21 北京声智科技有限公司 Segmenting method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682017A (en) * 2011-03-15 2012-09-19 阿里巴巴集团控股有限公司 Information retrieval method and system
CN103377259A (en) * 2012-04-28 2013-10-30 北京新媒传信科技有限公司 Multiple-mode-string matching method and device
CN103383699A (en) * 2013-06-28 2013-11-06 安徽科大讯飞信息科技股份有限公司 Character string retrieval method and system
CN105183788A (en) * 2015-08-20 2015-12-23 及时标讯网络信息技术(北京)有限公司 Operation method for Chinese AC automatic machine based on retrieval of keyword dictionary tree
US10230639B1 (en) * 2017-08-08 2019-03-12 Innovium, Inc. Enhanced prefix matching
CN109918664A (en) * 2019-03-05 2019-06-21 北京声智科技有限公司 Segmenting method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898381A (en) * 2020-06-30 2020-11-06 北京来也网络科技有限公司 Text information extraction method, device, equipment and medium combining RPA and AI

Similar Documents

Publication Publication Date Title
US10642938B2 (en) Artificial intelligence based method and apparatus for constructing comment graph
US20200301954A1 (en) Reply information obtaining method and apparatus
CN103123618B (en) Text similarity acquisition methods and device
CN105095182B (en) A kind of return information recommendation method and device
CN110457672B (en) Keyword determination method and device, electronic equipment and storage medium
CN108664599B (en) Intelligent question-answering method and device, intelligent question-answering server and storage medium
CN107146610A (en) A kind of determination method and device of user view
CN104933130A (en) Comment information marking method and comment information marking device
CN103365992A (en) Method for realizing dictionary search of Trie tree based on one-dimensional linear space
CN110059177B (en) Activity recommendation method and device based on user portrait
CN109918664B (en) Word segmentation method and device
CN110427478A (en) A kind of the question and answer searching method and system of knowledge based map
CN111368072A (en) Microblog hot topic discovery algorithm based on linear fusion of BTM and GloVe similarity
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
Tur et al. Towards unsupervised spoken language understanding: Exploiting query click logs for slot filling
JP2013196680A (en) Concept recognition method and concept recognition device based on co-learning
CN111061829A (en) Tree type retrieval method and device
CN103309851B (en) The rubbish recognition methods of short text and system
CN111597302B (en) Text event acquisition method and device, electronic equipment and storage medium
CN106933818A (en) A kind of quick multiple key text matching technique and device
CN110705285B (en) Government affair text subject word library construction method, device, server and readable storage medium
CN104077419B (en) With reference to semantic method for reordering is retrieved with the long query image of visual information
US7853597B2 (en) Product line extraction
CN114861608A (en) Data processing method, device, equipment and storage medium
CN105183807A (en) emotion reason event identifying method and system based on structure syntax

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200424

RJ01 Rejection of invention patent application after publication