CN105912606A - Synonym expansion based relational database keyword search method - Google Patents

Synonym expansion based relational database keyword search method Download PDF

Info

Publication number
CN105912606A
CN105912606A CN201610209883.6A CN201610209883A CN105912606A CN 105912606 A CN105912606 A CN 105912606A CN 201610209883 A CN201610209883 A CN 201610209883A CN 105912606 A CN105912606 A CN 105912606A
Authority
CN
China
Prior art keywords
synonym
database
datagram
synonyms
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610209883.6A
Other languages
Chinese (zh)
Inventor
黄定芳
刘和云
谢东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Humanities Science and Technology
Original Assignee
Hunan University of Humanities Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Humanities Science and Technology filed Critical Hunan University of Humanities Science and Technology
Priority to CN201610209883.6A priority Critical patent/CN105912606A/en
Publication of CN105912606A publication Critical patent/CN105912606A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2425Iterative querying; Query formulation based on the results of a preceding query
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a synonym expansion based relational database keyword search method. The method successively includes the following steps: establishing a synonym database; sorting a plurality of words according to synonym and storing the words to the synonym database, wherein the synonym database includes groups of synonym word groups K, each synonym word group consists of a plurality of synonyms (k1, k2, ..., kn), and during search, a user gives a keyword group X, and the keyword group X consists of a plurality of keywords (x1, x2, ..., xn ); inputting keywords into the synonym database, and matching the keywords and words in the synonym database, wherein if one keyword is matched with a certain word in the synonym database, the synonym word group where the word is located is the synonym of the keyword. Through the query expansion method based on the synonym database, the keyword search technique and characters of all the fields are integrated, a keyword search system which is applied to different fields can be researched and developed, and the precision ratio and the recall ratio of the system to keywords can be improved.

Description

Relational database keyword search methodology based on synonym extension
Technical field
The present invention relates to the retrieval of key word, search field in data base, particularly relate to a kind of relation based on synonym extension Database key word searching method.
Background technology
Along with the development of information technology, the data in information world are continuously increased, and wherein mix industry-by-industry every field In data miscellaneous.This most just requires keyword retrieval technology development and perfect, and the content that user is searched for is the most single The present key word of monomer itself, and be more to have various information contacted with key word.
In most of relational database keyword retrieval systems, the search to key word has simply used simple coupling, lacks Semantic extension to key word.Such as, user inputs a key word, system this how to judge that to be searched for this of user closes Keyword is belonging to any field, and searching system lacks the semantic matches to key word, the term of different field the most easily occurs Nearly justice or phase parity problem, be difficult to search out the result that user wants accurately.Particularly in agriculture field, such as, work as user During input Fructus Mali pumilae, it may appear that the electronic products such as the mobile phone of apple series, also there will be the Fructus Mali pumilae of fruits, be difficult to select for user. When user inputs Rhizoma Solani tuber osi, result the most only occurs the information comprising key word Rhizoma Solani tuber osi, and in data base, comprises key word horse The result of the relevant informations such as bell potato disallowable fall.Therefore, the key word to user's input carries out semanteme and syntactic analysis is also to close Emphasis in keyword retrieval technique, uses semantic matches to scan for after being extended the searching keyword that user inputs again, this Sample not only increases the inquiry accuracy rate of system, also makes the result returned more meet the demand of user.
Summary of the invention
The defect of prior art and various weak point in view of the above, the technical problem to be solved in the present invention is to provide one Relational database keyword search methodology based on synonym extension, be effectively improved in system the accuracy rate of keyword query and Reliability, to meet the demand of user.
For achieving the above object, the present invention provides a kind of relational database keyword search methodology based on synonym extension, successively Comprise the following steps:
1) database of synonyms, is set up: be stored in respectively in database of synonyms after being classified by synonym by multiple words, described synonym Comprising many group synonym phrase K in word data base, often group synonym phrase is made up of, then several synonyms k1, k2 ..., kn K={k1, k2 ..., kn};
2), search time, user provides crucial phrase X, crucial phrase X and is made up of several key words x1, x2 ..., xn, Then X={x1, x2 ..., xn};
3), described key word inputted in described database of synonyms and mate with the word in database of synonyms;
If the word of certain in database of synonyms on Keywords matching, the then synonym that synonym phrase is described key word at this word place Word.
Further, during often group synonym phrase is all placed in a row, and separated by space between two adjacent synonyms.
Preferably, first the synonym k1 often organized in synonym phrase is modal word in this group synonym phrase.
Further, described step 3) further comprising the steps of:
31) the datagram G of datagram G: the database of synonyms, setting up database of synonyms includes several nodes v and Some the limit e being made up of described several nodes v, thus the datagram G of database of synonyms include node set V (G) and The set E (G) on limit, and each the limit e of datagram G have weight WG(E);
32), the Steiner tree in datagram G is searched: in datagram G, there is relationIf T is datagram In G one connects subtree, and T contains all nodes in v ', then about a Steiner of v ' during T is datagram G Tree;
33), the Minimum Steiner tree in datagram G is searched: the weights of described Steiner tree T are:
C (T)=∑e∈E(T)WG(e);
In formula: E (T) is the set on limit, W in Steiner tree TGE () represents the weights of limit e;
When C (T) is minimum, corresponding T is a Minimum Steiner tree;
34), the Top-k Steiner tree in datagram G is searched: described crucial phrase X={x1, x2 ..., xn} are at datagram Minimum Steiner tree corresponding in G is T1, T2 ..., Tn, and T1, T2 ..., Tn are according to the size of limit weights summation Descending sorting successively, T1, T2 ..., Tn are crucial phrase X={x1, x2 ..., the Top-k Steiner tree of xn};
35), Top-k Steiner tree is returned to user.
The relational database keyword search methodology based on synonym extension that the present invention relates to has the advantages that
The application uses enquiry expanding method based on database of synonyms, is extended the semanteme of word, and is examined by key word Rope technology combines with the feature in each big field, and research and development is applicable to the keyword retrieval system of different field, is applied to each Field, thus the system that improves is to the precision ratio of key word and recall ratio.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, and can depend on The content of book is practiced as directed, below with presently preferred embodiments of the present invention and coordinate accompanying drawing to be described in detail this patent.
Accompanying drawing explanation
Fig. 1 is the flow chart of the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawings the preferred embodiments of the present invention are described in detail.
As it is shown in figure 1, the present invention provides a kind of relational database keyword search methodology based on synonym extension, include successively Following steps:
Step 1), set up database of synonyms: by multiple words by synonym classify after be stored in respectively in database of synonyms, often group Separated by space between two synonyms during synonym phrase is all placed in a row and adjacent;Described database of synonyms comprises Many group synonym phrase K, often group synonym phrase is made up of several synonyms k1, k2 ..., kn, then K={k1, k2 ..., kn}.Meanwhile, first the synonym k1 often organized in synonym phrase is modal word in this group synonym phrase.Such as, in agriculture In industry field, Rhizoma Solani tuber osi, Rhizoma Solani tuber osi, ocean Rhizoma Dioscoreae are one group of synonym phrase, are placed in same a line of database of synonyms, and Rhizoma Solani tuber osi First word for this row synonym phrase;The most such as, Radix Ipomoeae, Radix Pachyrhizi Erosi, Ipomoea batatas Lam. are one group of synonym phrase, are placed on synonym data In the same a line in storehouse, and first word that Radix Ipomoeae is this row synonym phrase.
Step 2), search time, user provides crucial phrase X, and crucial phrase X is by several key words x1, x2 ..., xn Constitute, then X={x1, x2 ..., xn}.
Step 3), described key word is inputted in described database of synonyms and mates with the word in database of synonyms; If the word of certain in database of synonyms on Keywords matching, the then synonym that synonym phrase is described key word at this word place.
Described step 3) further comprising the steps of:
Step 31), the datagram G of datagram G: the database of synonyms of setting up database of synonyms includes several joints Point v and some the limit e being made up of described several nodes v, therefore the datagram G of database of synonyms is by two set institutes The directed graph constituted, two collection are combined into the set V (G) of node and the set E (G) on limit;Wherein, each the limit e of datagram G Direction be to be determined by pointing to of main external key, and each limit e have weight WG(E), weight W (G) of datagram G For: W (G)=∑e∈e(g)WG(E)。
Therefore, in directed graph G, if u, v are two nodes in figure, weights are composed on the limit for the two node, as Shown in lower formula:
We((u, v))=1
We ((u, v))=log2 (1+Nin (v)
Wherein, We((u, v)) represents the main foreign key relationship between forward edge, i.e. node u and v, We((v, u)) represents backward dege, i.e. Main foreign key relationship between node v and u, Nin (v) represents the node quantity quoting v.Herein by limits all in datagram Weights are assigned to 1.
Step 32), the Steiner tree searched in datagram G: in datagram G, there is relationIf T is several According to a connection subtree in figure G, and T contains all nodes in v ', then about one of v ' during T is datagram G Steiner tree.
Step 33), the Minimum Steiner tree searched in datagram G: the weights of described Steiner tree T are:
C (T)=Σe∈E(T)WG(e);
In formula: E (T) is the set on limit, W in Steiner tree TGE () represents the weights of limit e;
When C (T) is minimum, corresponding T is a Minimum Steiner tree.
Step 34), the Top-k Steiner tree searched in datagram G: described key phrase X={x1, x2 ..., xn} are counting It is T1, T2 ..., Tn according to corresponding Minimum Steiner tree in figure G, and T1, T2 ..., Tn are according to limit weights summation Size is descending to sort successively, i.e. S (T1)≤S(T2)≤...≤S(Tn), itself and the sortord of keyword search results Similar;
T1, T2 ..., Tn are crucial phrase X={x1, x2 ..., the Top-k Steiner tree of xn}.
Step 35), find out qualified Steiner tree, and Top-k Steiner tree is returned as the Query Result of crucial phrase Back to user, i.e. find front k Minimum Steiner tree.
Specifically, first it is to look for the 1st class Steiner tree, because this kind of Steiner tree is optimal Query Result, and is not required to The connection of tuple to be calculated.So preferentially inquiring all of 1st class Steiner tree and exporting, if finding k the 1 class Steiner tree, then poll-final, returns k result;If the number of the 1st class Steiner tree is not reaching to k, then Inquire about the 2nd class Steiner tree.For the 2nd class Steiner tree, it needs 2 tuples to be attached, and is below the 2nd class Steiner The computational methods of tree.
Assume the crucial phrase of input comprises 3 key words K={k1, k2, k3}, when calculating the 2nd class Steiner tree, Need to find the connection result of two tuples.Take the highest node of a frequency of occurrences (2) as in the 2nd class Steiner tree One node, then begins with the mode of bidirectional research from this node and carries out traversal search, if finding a node to make It carries out comprising all of key word after operation with node (2), then this node is added in Steiner tree, with node (2) Constitute a 2nd class Steiner tree.Circulation said process, until finding all of 2nd class Steiner tree.
For the i-th class Steiner tree, then need to find i tuple to be attached.
In sum, the application uses enquiry expanding method based on database of synonyms, is extended the semanteme of word, and The feature of keyword retrieval technology with each big field being combined, research and development is applicable to the keyword retrieval system of different field, It is applied to each field, thus the system that improves is to the precision ratio of key word and recall ratio, and solve practical problem.Meanwhile, with The fast development of semantic technology and body, relational database keyword retrieval technology based on body also becomes study hotspot.Make Describe relational database with body semantic, thus develop semantic search over relational databases technology, system pair can be improved further The recall ratio of keyword search results and precision ratio.So, the present invention effectively overcomes various shortcoming of the prior art and has height Degree industrial utilization.
A kind of based on synonym extension the relational database keyword search methodology provided the embodiment of the present invention above is carried out It is discussed in detail, for one of ordinary skill in the art, according to the thought of the embodiment of the present invention, in detailed description of the invention and application All will change in scope, in sum, this specification content should not be construed as limitation of the present invention, all according to the present invention Any change that design philosophy is made is all within protection scope of the present invention.

Claims (4)

1. a relational database keyword search methodology based on synonym extension, it is characterised in that comprise the following steps successively:
1) database of synonyms, is set up: be stored in respectively in database of synonyms after being classified by synonym by multiple words, described synonym Comprising many group synonym phrase K in data base, often group synonym phrase is made up of, then several synonyms k1, k2 ..., kn K={k1, k2 ..., kn};
2), search time, user provides crucial phrase X, crucial phrase X and is made up of several key words x1, x2 ..., xn, Then X={x1, x2 ..., xn};
3), described key word inputted in described database of synonyms and mate with the word in database of synonyms;
If the word of certain in database of synonyms on Keywords matching, the then synonym that synonym phrase is described key word at this word place Word.
Relational database keyword search methodology based on synonym extension the most according to claim 1, it is characterised in that: often group Separated by space between two synonyms during synonym phrase is all placed in a row and adjacent.
Relational database keyword search methodology based on synonym extension the most according to claim 1 and 2, it is characterised in that: Often first synonym k1 in group synonym phrase is modal word in this group synonym phrase.
Relational database keyword search methodology based on synonym extension the most according to claim 1, it is characterised in that: described Step 3) further comprising the steps of:
31) the datagram G of datagram G: the database of synonyms, setting up database of synonyms includes several nodes v and Some the limit e being made up of described several nodes v, therefore the datagram G of database of synonyms includes the set V (G) of node With the set E (G) on limit, and each the limit e of datagram G have weight WG(E);
32), the Steiner tree in datagram G is searched: in datagram G, there is relationIf T is datagram G In a connection subtree, and T contains all nodes in v ', then about a Steiner of v ' during T is datagram G Tree;
33), the Minimum Steiner tree in datagram G is searched: the weights of described Steiner tree T are:
C (T)=∑e∈E(T)WG(e);
In formula: E (T) is the set on limit, W in Steiner tree TGE () represents the weights of limit e;
When C (T) is minimum, corresponding T is a Minimum Steiner tree;
34), the Top-k Steiner tree in datagram G is searched: described crucial phrase X={x1, x2 ..., xn} are at datagram G The Minimum Steiner tree of middle correspondence is T1, T2 ..., Tn, and T1, T2 ..., Tn are according to the size of limit weights summation Descending sorting successively, T1, T2 ..., Tn are crucial phrase X={x1, x2 ..., the Top-k Steiner of xn} Tree;
35), Top-k Steiner tree is returned to user.
CN201610209883.6A 2016-04-05 2016-04-05 Synonym expansion based relational database keyword search method Pending CN105912606A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610209883.6A CN105912606A (en) 2016-04-05 2016-04-05 Synonym expansion based relational database keyword search method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610209883.6A CN105912606A (en) 2016-04-05 2016-04-05 Synonym expansion based relational database keyword search method

Publications (1)

Publication Number Publication Date
CN105912606A true CN105912606A (en) 2016-08-31

Family

ID=56745288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610209883.6A Pending CN105912606A (en) 2016-04-05 2016-04-05 Synonym expansion based relational database keyword search method

Country Status (1)

Country Link
CN (1) CN105912606A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145512A (en) * 2017-03-31 2017-09-08 北京大学 The method and apparatus of data query
CN107247800A (en) * 2017-06-28 2017-10-13 上海宽带技术及应用工程研究中心 Top k keyword search methodologies/system, readable storage medium storing program for executing and terminal
CN111367947A (en) * 2020-03-09 2020-07-03 北京奇艺世纪科技有限公司 Information retrieval method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145153A (en) * 2006-09-13 2008-03-19 阿里巴巴公司 Method and system for searching information
US20090132522A1 (en) * 2007-10-18 2009-05-21 Sami Leino Systems and methods for organizing innovation documents
CN103116653A (en) * 2013-03-05 2013-05-22 清华大学 Service resource searching method and system based on attribute matching
CN103177122A (en) * 2013-04-15 2013-06-26 天津理工大学 Personal document searching method based on synonyms
CN104008097A (en) * 2013-02-21 2014-08-27 日电(中国)有限公司 Method and device for achieving query understanding

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145153A (en) * 2006-09-13 2008-03-19 阿里巴巴公司 Method and system for searching information
US20090132522A1 (en) * 2007-10-18 2009-05-21 Sami Leino Systems and methods for organizing innovation documents
CN104008097A (en) * 2013-02-21 2014-08-27 日电(中国)有限公司 Method and device for achieving query understanding
CN103116653A (en) * 2013-03-05 2013-05-22 清华大学 Service resource searching method and system based on attribute matching
CN103177122A (en) * 2013-04-15 2013-06-26 天津理工大学 Personal document searching method based on synonyms

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
唐明珠: ""关系数据库中关键词搜索算法的研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145512A (en) * 2017-03-31 2017-09-08 北京大学 The method and apparatus of data query
CN107145512B (en) * 2017-03-31 2019-10-18 北京大学 The method and apparatus of data query
CN107247800A (en) * 2017-06-28 2017-10-13 上海宽带技术及应用工程研究中心 Top k keyword search methodologies/system, readable storage medium storing program for executing and terminal
CN107247800B (en) * 2017-06-28 2021-04-09 上海宽带技术及应用工程研究中心 Top-k keyword search method/system, readable storage medium and terminal
CN111367947A (en) * 2020-03-09 2020-07-03 北京奇艺世纪科技有限公司 Information retrieval method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN105653706B (en) A kind of multilayer quotation based on literature content knowledge mapping recommends method
US8341159B2 (en) Creating taxonomies and training data for document categorization
CN104636478B (en) Information query method and equipment
US8266144B2 (en) Techniques to perform relative ranking for search results
CN104794242B (en) Searching method
CN109299383B (en) Method and device for generating recommended word, electronic equipment and storage medium
CN100433007C (en) Method for providing research result
CN112257419B (en) Intelligent retrieval method and device for calculating patent document similarity based on word frequency and semantics, electronic equipment and storage medium thereof
CN105224648A (en) A kind of entity link method and system
CN102915381B (en) Visual network retrieval based on multi-dimensional semantic presents system and presents control method
CN104866572A (en) Method for clustering network-based short texts
CN108846029A (en) The information association analysis method of knowledge based map
CN101097570A (en) Advertisement classification method capable of automatic recognizing classified advertisement type
CN112307182B (en) Question-answering system-based pseudo-correlation feedback extended query method
CN103870507A (en) Method and device of searching based on category
CN102081668A (en) Information retrieval optimizing method based on domain ontology
CN102789452A (en) Similar content extraction method
US20070168346A1 (en) Method and system for implementing two-phased searching
CN105912606A (en) Synonym expansion based relational database keyword search method
US20080301111A1 (en) Method and system for providing ranked search results
CN108536819B (en) Method, device, server and storage medium for comparing integer column and character string
CN113407579B (en) Group query method, device, electronic equipment and readable storage medium
CN112507181B (en) Search request classification method, device, electronic equipment and storage medium
CN102508920A (en) Information retrieval method based on Boosting sorting algorithm
CN112765311A (en) Method for searching referee document

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160831

WD01 Invention patent application deemed withdrawn after publication