CN105912606A - Synonym expansion based relational database keyword search method - Google Patents
Synonym expansion based relational database keyword search method Download PDFInfo
- Publication number
- CN105912606A CN105912606A CN201610209883.6A CN201610209883A CN105912606A CN 105912606 A CN105912606 A CN 105912606A CN 201610209883 A CN201610209883 A CN 201610209883A CN 105912606 A CN105912606 A CN 105912606A
- Authority
- CN
- China
- Prior art keywords
- synonym
- database
- datagram
- synonyms
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2425—Iterative querying; Query formulation based on the results of a preceding query
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a synonym expansion based relational database keyword search method. The method successively includes the following steps: establishing a synonym database; sorting a plurality of words according to synonym and storing the words to the synonym database, wherein the synonym database includes groups of synonym word groups K, each synonym word group consists of a plurality of synonyms (k1, k2, ..., kn), and during search, a user gives a keyword group X, and the keyword group X consists of a plurality of keywords (x1, x2, ..., xn ); inputting keywords into the synonym database, and matching the keywords and words in the synonym database, wherein if one keyword is matched with a certain word in the synonym database, the synonym word group where the word is located is the synonym of the keyword. Through the query expansion method based on the synonym database, the keyword search technique and characters of all the fields are integrated, a keyword search system which is applied to different fields can be researched and developed, and the precision ratio and the recall ratio of the system to keywords can be improved.
Description
Technical field
The present invention relates to the retrieval of key word, search field in data base, particularly relate to a kind of relation based on synonym extension
Database key word searching method.
Background technology
Along with the development of information technology, the data in information world are continuously increased, and wherein mix industry-by-industry every field
In data miscellaneous.This most just requires keyword retrieval technology development and perfect, and the content that user is searched for is the most single
The present key word of monomer itself, and be more to have various information contacted with key word.
In most of relational database keyword retrieval systems, the search to key word has simply used simple coupling, lacks
Semantic extension to key word.Such as, user inputs a key word, system this how to judge that to be searched for this of user closes
Keyword is belonging to any field, and searching system lacks the semantic matches to key word, the term of different field the most easily occurs
Nearly justice or phase parity problem, be difficult to search out the result that user wants accurately.Particularly in agriculture field, such as, work as user
During input Fructus Mali pumilae, it may appear that the electronic products such as the mobile phone of apple series, also there will be the Fructus Mali pumilae of fruits, be difficult to select for user.
When user inputs Rhizoma Solani tuber osi, result the most only occurs the information comprising key word Rhizoma Solani tuber osi, and in data base, comprises key word horse
The result of the relevant informations such as bell potato disallowable fall.Therefore, the key word to user's input carries out semanteme and syntactic analysis is also to close
Emphasis in keyword retrieval technique, uses semantic matches to scan for after being extended the searching keyword that user inputs again, this
Sample not only increases the inquiry accuracy rate of system, also makes the result returned more meet the demand of user.
Summary of the invention
The defect of prior art and various weak point in view of the above, the technical problem to be solved in the present invention is to provide one
Relational database keyword search methodology based on synonym extension, be effectively improved in system the accuracy rate of keyword query and
Reliability, to meet the demand of user.
For achieving the above object, the present invention provides a kind of relational database keyword search methodology based on synonym extension, successively
Comprise the following steps:
1) database of synonyms, is set up: be stored in respectively in database of synonyms after being classified by synonym by multiple words, described synonym
Comprising many group synonym phrase K in word data base, often group synonym phrase is made up of, then several synonyms k1, k2 ..., kn
K={k1, k2 ..., kn};
2), search time, user provides crucial phrase X, crucial phrase X and is made up of several key words x1, x2 ..., xn,
Then X={x1, x2 ..., xn};
3), described key word inputted in described database of synonyms and mate with the word in database of synonyms;
If the word of certain in database of synonyms on Keywords matching, the then synonym that synonym phrase is described key word at this word place
Word.
Further, during often group synonym phrase is all placed in a row, and separated by space between two adjacent synonyms.
Preferably, first the synonym k1 often organized in synonym phrase is modal word in this group synonym phrase.
Further, described step 3) further comprising the steps of:
31) the datagram G of datagram G: the database of synonyms, setting up database of synonyms includes several nodes v and
Some the limit e being made up of described several nodes v, thus the datagram G of database of synonyms include node set V (G) and
The set E (G) on limit, and each the limit e of datagram G have weight WG(E);
32), the Steiner tree in datagram G is searched: in datagram G, there is relationIf T is datagram
In G one connects subtree, and T contains all nodes in v ', then about a Steiner of v ' during T is datagram G
Tree;
33), the Minimum Steiner tree in datagram G is searched: the weights of described Steiner tree T are:
C (T)=∑e∈E(T)WG(e);
In formula: E (T) is the set on limit, W in Steiner tree TGE () represents the weights of limit e;
When C (T) is minimum, corresponding T is a Minimum Steiner tree;
34), the Top-k Steiner tree in datagram G is searched: described crucial phrase X={x1, x2 ..., xn} are at datagram
Minimum Steiner tree corresponding in G is T1, T2 ..., Tn, and T1, T2 ..., Tn are according to the size of limit weights summation
Descending sorting successively, T1, T2 ..., Tn are crucial phrase X={x1, x2 ..., the Top-k Steiner tree of xn};
35), Top-k Steiner tree is returned to user.
The relational database keyword search methodology based on synonym extension that the present invention relates to has the advantages that
The application uses enquiry expanding method based on database of synonyms, is extended the semanteme of word, and is examined by key word
Rope technology combines with the feature in each big field, and research and development is applicable to the keyword retrieval system of different field, is applied to each
Field, thus the system that improves is to the precision ratio of key word and recall ratio.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, and can depend on
The content of book is practiced as directed, below with presently preferred embodiments of the present invention and coordinate accompanying drawing to be described in detail this patent.
Accompanying drawing explanation
Fig. 1 is the flow chart of the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawings the preferred embodiments of the present invention are described in detail.
As it is shown in figure 1, the present invention provides a kind of relational database keyword search methodology based on synonym extension, include successively
Following steps:
Step 1), set up database of synonyms: by multiple words by synonym classify after be stored in respectively in database of synonyms, often group
Separated by space between two synonyms during synonym phrase is all placed in a row and adjacent;Described database of synonyms comprises
Many group synonym phrase K, often group synonym phrase is made up of several synonyms k1, k2 ..., kn, then K={k1, k2 ...,
kn}.Meanwhile, first the synonym k1 often organized in synonym phrase is modal word in this group synonym phrase.Such as, in agriculture
In industry field, Rhizoma Solani tuber osi, Rhizoma Solani tuber osi, ocean Rhizoma Dioscoreae are one group of synonym phrase, are placed in same a line of database of synonyms, and Rhizoma Solani tuber osi
First word for this row synonym phrase;The most such as, Radix Ipomoeae, Radix Pachyrhizi Erosi, Ipomoea batatas Lam. are one group of synonym phrase, are placed on synonym data
In the same a line in storehouse, and first word that Radix Ipomoeae is this row synonym phrase.
Step 2), search time, user provides crucial phrase X, and crucial phrase X is by several key words x1, x2 ..., xn
Constitute, then X={x1, x2 ..., xn}.
Step 3), described key word is inputted in described database of synonyms and mates with the word in database of synonyms;
If the word of certain in database of synonyms on Keywords matching, the then synonym that synonym phrase is described key word at this word place.
Described step 3) further comprising the steps of:
Step 31), the datagram G of datagram G: the database of synonyms of setting up database of synonyms includes several joints
Point v and some the limit e being made up of described several nodes v, therefore the datagram G of database of synonyms is by two set institutes
The directed graph constituted, two collection are combined into the set V (G) of node and the set E (G) on limit;Wherein, each the limit e of datagram G
Direction be to be determined by pointing to of main external key, and each limit e have weight WG(E), weight W (G) of datagram G
For: W (G)=∑e∈e(g)WG(E)。
Therefore, in directed graph G, if u, v are two nodes in figure, weights are composed on the limit for the two node, as
Shown in lower formula:
We((u, v))=1
We ((u, v))=log2 (1+Nin (v)
Wherein, We((u, v)) represents the main foreign key relationship between forward edge, i.e. node u and v, We((v, u)) represents backward dege, i.e.
Main foreign key relationship between node v and u, Nin (v) represents the node quantity quoting v.Herein by limits all in datagram
Weights are assigned to 1.
Step 32), the Steiner tree searched in datagram G: in datagram G, there is relationIf T is several
According to a connection subtree in figure G, and T contains all nodes in v ', then about one of v ' during T is datagram G
Steiner tree.
Step 33), the Minimum Steiner tree searched in datagram G: the weights of described Steiner tree T are:
C (T)=Σe∈E(T)WG(e);
In formula: E (T) is the set on limit, W in Steiner tree TGE () represents the weights of limit e;
When C (T) is minimum, corresponding T is a Minimum Steiner tree.
Step 34), the Top-k Steiner tree searched in datagram G: described key phrase X={x1, x2 ..., xn} are counting
It is T1, T2 ..., Tn according to corresponding Minimum Steiner tree in figure G, and T1, T2 ..., Tn are according to limit weights summation
Size is descending to sort successively, i.e. S (T1)≤S(T2)≤...≤S(Tn), itself and the sortord of keyword search results
Similar;
T1, T2 ..., Tn are crucial phrase X={x1, x2 ..., the Top-k Steiner tree of xn}.
Step 35), find out qualified Steiner tree, and Top-k Steiner tree is returned as the Query Result of crucial phrase
Back to user, i.e. find front k Minimum Steiner tree.
Specifically, first it is to look for the 1st class Steiner tree, because this kind of Steiner tree is optimal Query Result, and is not required to
The connection of tuple to be calculated.So preferentially inquiring all of 1st class Steiner tree and exporting, if finding k the
1 class Steiner tree, then poll-final, returns k result;If the number of the 1st class Steiner tree is not reaching to k, then
Inquire about the 2nd class Steiner tree.For the 2nd class Steiner tree, it needs 2 tuples to be attached, and is below the 2nd class Steiner
The computational methods of tree.
Assume the crucial phrase of input comprises 3 key words K={k1, k2, k3}, when calculating the 2nd class Steiner tree,
Need to find the connection result of two tuples.Take the highest node of a frequency of occurrences (2) as in the 2nd class Steiner tree
One node, then begins with the mode of bidirectional research from this node and carries out traversal search, if finding a node to make
It carries out comprising all of key word after operation with node (2), then this node is added in Steiner tree, with node (2)
Constitute a 2nd class Steiner tree.Circulation said process, until finding all of 2nd class Steiner tree.
For the i-th class Steiner tree, then need to find i tuple to be attached.
In sum, the application uses enquiry expanding method based on database of synonyms, is extended the semanteme of word, and
The feature of keyword retrieval technology with each big field being combined, research and development is applicable to the keyword retrieval system of different field,
It is applied to each field, thus the system that improves is to the precision ratio of key word and recall ratio, and solve practical problem.Meanwhile, with
The fast development of semantic technology and body, relational database keyword retrieval technology based on body also becomes study hotspot.Make
Describe relational database with body semantic, thus develop semantic search over relational databases technology, system pair can be improved further
The recall ratio of keyword search results and precision ratio.So, the present invention effectively overcomes various shortcoming of the prior art and has height
Degree industrial utilization.
A kind of based on synonym extension the relational database keyword search methodology provided the embodiment of the present invention above is carried out
It is discussed in detail, for one of ordinary skill in the art, according to the thought of the embodiment of the present invention, in detailed description of the invention and application
All will change in scope, in sum, this specification content should not be construed as limitation of the present invention, all according to the present invention
Any change that design philosophy is made is all within protection scope of the present invention.
Claims (4)
1. a relational database keyword search methodology based on synonym extension, it is characterised in that comprise the following steps successively:
1) database of synonyms, is set up: be stored in respectively in database of synonyms after being classified by synonym by multiple words, described synonym
Comprising many group synonym phrase K in data base, often group synonym phrase is made up of, then several synonyms k1, k2 ..., kn
K={k1, k2 ..., kn};
2), search time, user provides crucial phrase X, crucial phrase X and is made up of several key words x1, x2 ..., xn,
Then X={x1, x2 ..., xn};
3), described key word inputted in described database of synonyms and mate with the word in database of synonyms;
If the word of certain in database of synonyms on Keywords matching, the then synonym that synonym phrase is described key word at this word place
Word.
Relational database keyword search methodology based on synonym extension the most according to claim 1, it is characterised in that: often group
Separated by space between two synonyms during synonym phrase is all placed in a row and adjacent.
Relational database keyword search methodology based on synonym extension the most according to claim 1 and 2, it is characterised in that:
Often first synonym k1 in group synonym phrase is modal word in this group synonym phrase.
Relational database keyword search methodology based on synonym extension the most according to claim 1, it is characterised in that: described
Step 3) further comprising the steps of:
31) the datagram G of datagram G: the database of synonyms, setting up database of synonyms includes several nodes v and
Some the limit e being made up of described several nodes v, therefore the datagram G of database of synonyms includes the set V (G) of node
With the set E (G) on limit, and each the limit e of datagram G have weight WG(E);
32), the Steiner tree in datagram G is searched: in datagram G, there is relationIf T is datagram G
In a connection subtree, and T contains all nodes in v ', then about a Steiner of v ' during T is datagram G
Tree;
33), the Minimum Steiner tree in datagram G is searched: the weights of described Steiner tree T are:
C (T)=∑e∈E(T)WG(e);
In formula: E (T) is the set on limit, W in Steiner tree TGE () represents the weights of limit e;
When C (T) is minimum, corresponding T is a Minimum Steiner tree;
34), the Top-k Steiner tree in datagram G is searched: described crucial phrase X={x1, x2 ..., xn} are at datagram G
The Minimum Steiner tree of middle correspondence is T1, T2 ..., Tn, and T1, T2 ..., Tn are according to the size of limit weights summation
Descending sorting successively, T1, T2 ..., Tn are crucial phrase X={x1, x2 ..., the Top-k Steiner of xn}
Tree;
35), Top-k Steiner tree is returned to user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610209883.6A CN105912606A (en) | 2016-04-05 | 2016-04-05 | Synonym expansion based relational database keyword search method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610209883.6A CN105912606A (en) | 2016-04-05 | 2016-04-05 | Synonym expansion based relational database keyword search method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105912606A true CN105912606A (en) | 2016-08-31 |
Family
ID=56745288
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610209883.6A Pending CN105912606A (en) | 2016-04-05 | 2016-04-05 | Synonym expansion based relational database keyword search method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105912606A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145512A (en) * | 2017-03-31 | 2017-09-08 | 北京大学 | The method and apparatus of data query |
CN107247800A (en) * | 2017-06-28 | 2017-10-13 | 上海宽带技术及应用工程研究中心 | Top k keyword search methodologies/system, readable storage medium storing program for executing and terminal |
CN111367947A (en) * | 2020-03-09 | 2020-07-03 | 北京奇艺世纪科技有限公司 | Information retrieval method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101145153A (en) * | 2006-09-13 | 2008-03-19 | 阿里巴巴公司 | Method and system for searching information |
US20090132522A1 (en) * | 2007-10-18 | 2009-05-21 | Sami Leino | Systems and methods for organizing innovation documents |
CN103116653A (en) * | 2013-03-05 | 2013-05-22 | 清华大学 | Service resource searching method and system based on attribute matching |
CN103177122A (en) * | 2013-04-15 | 2013-06-26 | 天津理工大学 | Personal document searching method based on synonyms |
CN104008097A (en) * | 2013-02-21 | 2014-08-27 | 日电(中国)有限公司 | Method and device for achieving query understanding |
-
2016
- 2016-04-05 CN CN201610209883.6A patent/CN105912606A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101145153A (en) * | 2006-09-13 | 2008-03-19 | 阿里巴巴公司 | Method and system for searching information |
US20090132522A1 (en) * | 2007-10-18 | 2009-05-21 | Sami Leino | Systems and methods for organizing innovation documents |
CN104008097A (en) * | 2013-02-21 | 2014-08-27 | 日电(中国)有限公司 | Method and device for achieving query understanding |
CN103116653A (en) * | 2013-03-05 | 2013-05-22 | 清华大学 | Service resource searching method and system based on attribute matching |
CN103177122A (en) * | 2013-04-15 | 2013-06-26 | 天津理工大学 | Personal document searching method based on synonyms |
Non-Patent Citations (1)
Title |
---|
唐明珠: ""关系数据库中关键词搜索算法的研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145512A (en) * | 2017-03-31 | 2017-09-08 | 北京大学 | The method and apparatus of data query |
CN107145512B (en) * | 2017-03-31 | 2019-10-18 | 北京大学 | The method and apparatus of data query |
CN107247800A (en) * | 2017-06-28 | 2017-10-13 | 上海宽带技术及应用工程研究中心 | Top k keyword search methodologies/system, readable storage medium storing program for executing and terminal |
CN107247800B (en) * | 2017-06-28 | 2021-04-09 | 上海宽带技术及应用工程研究中心 | Top-k keyword search method/system, readable storage medium and terminal |
CN111367947A (en) * | 2020-03-09 | 2020-07-03 | 北京奇艺世纪科技有限公司 | Information retrieval method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105653706B (en) | A kind of multilayer quotation based on literature content knowledge mapping recommends method | |
US8341159B2 (en) | Creating taxonomies and training data for document categorization | |
CN104636478B (en) | Information query method and equipment | |
US8266144B2 (en) | Techniques to perform relative ranking for search results | |
CN104794242B (en) | Searching method | |
CN109299383B (en) | Method and device for generating recommended word, electronic equipment and storage medium | |
CN100433007C (en) | Method for providing research result | |
CN112257419B (en) | Intelligent retrieval method and device for calculating patent document similarity based on word frequency and semantics, electronic equipment and storage medium thereof | |
CN105224648A (en) | A kind of entity link method and system | |
CN102915381B (en) | Visual network retrieval based on multi-dimensional semantic presents system and presents control method | |
CN104866572A (en) | Method for clustering network-based short texts | |
CN108846029A (en) | The information association analysis method of knowledge based map | |
CN101097570A (en) | Advertisement classification method capable of automatic recognizing classified advertisement type | |
CN112307182B (en) | Question-answering system-based pseudo-correlation feedback extended query method | |
CN103870507A (en) | Method and device of searching based on category | |
CN102081668A (en) | Information retrieval optimizing method based on domain ontology | |
CN102789452A (en) | Similar content extraction method | |
US20070168346A1 (en) | Method and system for implementing two-phased searching | |
CN105912606A (en) | Synonym expansion based relational database keyword search method | |
US20080301111A1 (en) | Method and system for providing ranked search results | |
CN108536819B (en) | Method, device, server and storage medium for comparing integer column and character string | |
CN113407579B (en) | Group query method, device, electronic equipment and readable storage medium | |
CN112507181B (en) | Search request classification method, device, electronic equipment and storage medium | |
CN102508920A (en) | Information retrieval method based on Boosting sorting algorithm | |
CN112765311A (en) | Method for searching referee document |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160831 |
|
WD01 | Invention patent application deemed withdrawn after publication |