CN1763739A - Search method based on semantics in search engine - Google Patents
Search method based on semantics in search engine Download PDFInfo
- Publication number
- CN1763739A CN1763739A CN 200410009691 CN200410009691A CN1763739A CN 1763739 A CN1763739 A CN 1763739A CN 200410009691 CN200410009691 CN 200410009691 CN 200410009691 A CN200410009691 A CN 200410009691A CN 1763739 A CN1763739 A CN 1763739A
- Authority
- CN
- China
- Prior art keywords
- file
- resource
- resource information
- search
- search engine
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an index method in the file search engine based on the semantics, which is characterized by the following: establishing resource information base and matchingship of the resource information base, file and user input searching word; adapting the matchingship to mate corresponding file if successful; adapting the searched word to find file directly and returning the searching result if failure.
Description
Technical field
The present invention relates to a kind of method for information retrieval, relate in particular to the search method in a kind of medium file search engine based on semanteme.
Background technology
Search engine can be divided into Web search engine and medium file search engine.At present, the search method of known medium file search engine, as Beijing University's sky online article spare search engine (http://bingle.pku.edu.cn), be based on string matching: the user imports a query word, searching system is searched the file entries that comprises this character string in file entries all to be retrieved, and returns to the user.But the recall ratio of this search method and precision ratio all are not high.At first, the name of a lot of files is also lack of standardization, a lot of resources such as software, film etc. all have the title of different language, when the user is referred to as query word when input with a kind of name of language, if this document is to name with other language, then can not retrieve this document clauses and subclauses, this has caused recall ratio not high.Secondly, when the user imported a character string, this character string had comprised semantic information often, and simple string matching tends to return user and unwanted file entries.As a picture file that comprises the user inquiring speech often is not the accurate return results of a software query requests.And present retrieval only instrument can be returned Query Result, and more useful adding description information can not be provided.
Summary of the invention
At the existing problem and shortage of information retrieval method in the above-mentioned existing search engine, the purpose of this invention is to provide a kind of medium file search engine, thereby improve recall ratio and precision ratio based on semanteme.
The present invention is achieved in that the search method based on semanteme in a kind of medium file search engine, may further comprise the steps,
1) sets up resource information bank, set up the matching relationship of this resource information bank and file, user input query speech simultaneously;
2) behind the user input query speech, at first go coupling, if the match is successful, then utilize the resource information in this resource information bank and the matching relationship of file to remove to mate corresponding document, and return Search Results to resource information bank; If it fails to match, then directly utilize this query word search file, and return Search Results.
Further, the resource information in the described resource information bank comprises resource class and resource introduction.
Further, described resource information bank is classified to resource information according to resource class.
Further, the matching relationship of resource in the described resource information bank and file comprises the information such as file name, size, file type, path of resource respective file.
Further, the matching relationship of resource in the described resource information bank and user inquiring speech comprises the query word that the user may import when this resource is inquired about.
The present invention at first sets up a resource information bank for the user.Like this, when the user input query speech, at first in resource information bank, mate, go matching files again with the resource information that matches.
Utilization of the present invention has comprised a plurality of information of each basic resources to be inquired about file, when therefore using a kind of title to inquire about for the user, also utilizes other resource information to inquire about simultaneously in internal system of the present invention, and recall ratio is improved.The present invention just handles trusted file, and size, the extension name type of file of coupling limited, thereby precision ratio is also corresponding is improved.
Embodiment
For science of the present invention and practicality are described, below search engine is carried out simple analysis.File that the present invention can provide from each file server and user's query demand two aspects are analyzed.Though file enormous amount, kind that the current file server provides are numerous, but its type is very concentrated, the present invention is to 839 ftp servers, 13 of sky online article spare search engine, 306,765 files are added up according to file extension, find the file of types such as file mainly concentrates on music, video display, can carry out, compression, the classification of visible mutation still clearly.And, carrying out statistics and analysis by query word to search engine, this class file is the highest file of user inquiring rate just also.After other websites are added up, consistent with The above results.This illustrates that fully though user's query word difference content huge, each file server also is not quite similar, both sides' main supply and demand is several classifications of concentrating relatively.Specifically, concentrate on exactly in software, singer, song, film, these five classifications of playing.This point is a key character of medium file search engine, and it is different from the Web search engine.The Web search is in order to obtain to comprise the specific webpage of appointed information; It then is in order to obtain the download address of certain logic resource that the user uses the purpose of medium file search engine.
The present invention at first needs to set up a basic resources storehouse, and it has comprised description information of files, generally can adopt the professional download site of method from Web of information extraction to obtain this information; The present invention is that example illustrates that the information state of this resources bank is as follows with " NetAnts ":
[dbase] NetAnts netants;
[software Chinese] NetAnts;
[software English name] netants;
[classification]/Software/ network tool;
[software size] 871424Bytes;
[exploitation]
Http:// www.netants.com/;
[language form] simplified form of Chinese Character;
[platform] win9x/me/nt/2000/xp;
[introduce in detail] NetAnts are instruments that are used for file in download.The built-in Chinese edition of this version.The NetAnts characteristic is: it has further expanded the function of breakpoint transmission, can carry out multicast communication.New features: support plug-in resource packet, the speed of download of drag and drop basket shows.
For each resource is set up the query word matching characteristic, the Chinese and English title of resource normally finally forms the corresponding relation of the multi-to-multi between resource and the query word.Can be corresponding to a kind of resource except that different query words, the present invention also supports the situation of a query word corresponding to multiple resource.
Resources bank of the present invention is also classified to Miscellaneous Documents.Classify by files such as above-mentioned software, song, film, recreation, with convenient coupling.
When submit queries, user's query word is mapped to certain concrete mated resource information, carry out the file coupling with this resource information again.Carrying out file with resource information when mating, mate with the form of character string, promptly mate respectively with each descriptor.This can mate resource matched document entity back output that the match is successful, and to the descriptor of this resource.Here, document entity is meant a file that can unique location or the catalogue that is made of a plurality of files.Its include file name or directory name, time, size, path.Having the FTP address of determining URL such as one is exactly a document entity.The purpose of user's query requests obtains one or more document entities exactly in medium file search engine.
If successfully do not mate any resource of mating, the same support of the present invention carried out the routine inquiry according to simple string matching.
Claims (6)
1, the search method based on semanteme in a kind of medium file search engine may further comprise the steps,
1) sets up resource information bank, set up the matching relationship of this resource information bank and file, user input query speech simultaneously;
2) behind the user input query speech, at first go coupling, if the match is successful, then utilize the resource information in this resource information bank and the matching relationship of file to remove to mate corresponding document, and return Search Results to resource information bank; If it fails to match, then directly utilize this query word search file, and return Search Results.
2, the search method based on semanteme in the medium file search engine as claimed in claim 1 is characterized in that, the resource information in the described resource information bank comprises resource class and resource introduction.
3, the search method based on semanteme in the medium file search engine as claimed in claim 1 is characterized in that, the resource in the described resource information bank and the matching relationship of file comprise file name, size, the file type of resource respective file.
4, the search method based on semanteme in the medium file search engine as claimed in claim 1 is characterized in that, the matching relationship of resource in the described resource information bank and user inquiring speech comprises the query word that the user may import when this resource is inquired about.
5, the search method based on semanteme in the medium file search engine as claimed in claim 2 is characterized in that, described resource information bank is classified to resource information according to resource class.
6, the search method based on semanteme in the medium file search engine as claimed in claim 3 is characterized in that, described file available is specially the believable file of fileinfo.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200410009691 CN1763739A (en) | 2004-10-21 | 2004-10-21 | Search method based on semantics in search engine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200410009691 CN1763739A (en) | 2004-10-21 | 2004-10-21 | Search method based on semantics in search engine |
Publications (1)
Publication Number | Publication Date |
---|---|
CN1763739A true CN1763739A (en) | 2006-04-26 |
Family
ID=36747877
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 200410009691 Pending CN1763739A (en) | 2004-10-21 | 2004-10-21 | Search method based on semantics in search engine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1763739A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100442292C (en) * | 2007-03-22 | 2008-12-10 | 华中科技大学 | Method for indexing and acquiring semantic net information |
CN101251841B (en) * | 2007-05-17 | 2011-06-29 | 华东师范大学 | Method for establishing and searching feature matrix of Web document based on semantics |
CN102253982A (en) * | 2011-06-24 | 2011-11-23 | 北京理工大学 | Query suggestion method based on query semantics and click-through data |
CN102799648A (en) * | 2012-06-28 | 2012-11-28 | 用友软件股份有限公司 | Retrieval device and method |
CN103092945A (en) * | 2013-01-11 | 2013-05-08 | 北京百度网讯科技有限公司 | Searching method and device based on interface returning |
CN103559313A (en) * | 2013-11-20 | 2014-02-05 | 北京奇虎科技有限公司 | Searching method and device |
CN107943822A (en) * | 2017-10-13 | 2018-04-20 | 南京邮电大学 | OGC geographic information services semantic retrieving methods based on MIML |
-
2004
- 2004-10-21 CN CN 200410009691 patent/CN1763739A/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100442292C (en) * | 2007-03-22 | 2008-12-10 | 华中科技大学 | Method for indexing and acquiring semantic net information |
CN101251841B (en) * | 2007-05-17 | 2011-06-29 | 华东师范大学 | Method for establishing and searching feature matrix of Web document based on semantics |
CN102253982A (en) * | 2011-06-24 | 2011-11-23 | 北京理工大学 | Query suggestion method based on query semantics and click-through data |
CN102799648A (en) * | 2012-06-28 | 2012-11-28 | 用友软件股份有限公司 | Retrieval device and method |
CN103092945A (en) * | 2013-01-11 | 2013-05-08 | 北京百度网讯科技有限公司 | Searching method and device based on interface returning |
CN103092945B (en) * | 2013-01-11 | 2019-11-26 | 北京百度网讯科技有限公司 | A kind of searching method and device returned based on interface |
CN103559313A (en) * | 2013-11-20 | 2014-02-05 | 北京奇虎科技有限公司 | Searching method and device |
CN103559313B (en) * | 2013-11-20 | 2018-02-23 | 北京奇虎科技有限公司 | Searching method and device |
CN107943822A (en) * | 2017-10-13 | 2018-04-20 | 南京邮电大学 | OGC geographic information services semantic retrieving methods based on MIML |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10210184B2 (en) | Methods and systems for enhancing metadata | |
CN102063476B (en) | Video searching method and system | |
US8938459B2 (en) | System and method for distributed index searching of electronic content | |
US8768912B2 (en) | System and method for geographically organizing and classifying businesses on the world-wide web | |
CA2511098A1 (en) | Dispersing search engine results by using page category information | |
US20200175081A1 (en) | Server, method and system for providing information search service by using sheaf of pages | |
CN1728134A (en) | Multi-language network information search method and system based on supertext | |
CN1360267A (en) | Sorting and searching method for files | |
CN103617174A (en) | Distributed searching method based on cloud computing | |
CN102622402B (en) | Server, method and system for providing information search service by using sheaf of pages | |
CN109542930A (en) | A kind of data efficient search method based on ElasticSearch | |
CN1763739A (en) | Search method based on semantics in search engine | |
Li et al. | Research on web mining-based intelligent search engine | |
Ozmutlu et al. | Trends in multimedia web searching: excite queries | |
KR20020087602A (en) | System and method for searching music information through the internet | |
Zhang | Efficient indexing and query processing in distributed search engines |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |