CN1763739A - Search method based on semantics in search engine - Google Patents

Search method based on semantics in search engine Download PDF

Info

Publication number
CN1763739A
CN1763739A CN 200410009691 CN200410009691A CN1763739A CN 1763739 A CN1763739 A CN 1763739A CN 200410009691 CN200410009691 CN 200410009691 CN 200410009691 A CN200410009691 A CN 200410009691A CN 1763739 A CN1763739 A CN 1763739A
Authority
CN
China
Prior art keywords
file
resource
resource information
search
search engine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200410009691
Other languages
Chinese (zh)
Inventor
谢欣
李晓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN 200410009691 priority Critical patent/CN1763739A/en
Publication of CN1763739A publication Critical patent/CN1763739A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an index method in the file search engine based on the semantics, which is characterized by the following: establishing resource information base and matchingship of the resource information base, file and user input searching word; adapting the matchingship to mate corresponding file if successful; adapting the searched word to find file directly and returning the searching result if failure.

Description

Search method in the search engine based on semanteme
Technical field
The present invention relates to a kind of method for information retrieval, relate in particular to the search method in a kind of medium file search engine based on semanteme.
Background technology
Search engine can be divided into Web search engine and medium file search engine.At present, the search method of known medium file search engine, as Beijing University's sky online article spare search engine (http://bingle.pku.edu.cn), be based on string matching: the user imports a query word, searching system is searched the file entries that comprises this character string in file entries all to be retrieved, and returns to the user.But the recall ratio of this search method and precision ratio all are not high.At first, the name of a lot of files is also lack of standardization, a lot of resources such as software, film etc. all have the title of different language, when the user is referred to as query word when input with a kind of name of language, if this document is to name with other language, then can not retrieve this document clauses and subclauses, this has caused recall ratio not high.Secondly, when the user imported a character string, this character string had comprised semantic information often, and simple string matching tends to return user and unwanted file entries.As a picture file that comprises the user inquiring speech often is not the accurate return results of a software query requests.And present retrieval only instrument can be returned Query Result, and more useful adding description information can not be provided.
Summary of the invention
At the existing problem and shortage of information retrieval method in the above-mentioned existing search engine, the purpose of this invention is to provide a kind of medium file search engine, thereby improve recall ratio and precision ratio based on semanteme.
The present invention is achieved in that the search method based on semanteme in a kind of medium file search engine, may further comprise the steps,
1) sets up resource information bank, set up the matching relationship of this resource information bank and file, user input query speech simultaneously;
2) behind the user input query speech, at first go coupling, if the match is successful, then utilize the resource information in this resource information bank and the matching relationship of file to remove to mate corresponding document, and return Search Results to resource information bank; If it fails to match, then directly utilize this query word search file, and return Search Results.
Further, the resource information in the described resource information bank comprises resource class and resource introduction.
Further, described resource information bank is classified to resource information according to resource class.
Further, the matching relationship of resource in the described resource information bank and file comprises the information such as file name, size, file type, path of resource respective file.
Further, the matching relationship of resource in the described resource information bank and user inquiring speech comprises the query word that the user may import when this resource is inquired about.
The present invention at first sets up a resource information bank for the user.Like this, when the user input query speech, at first in resource information bank, mate, go matching files again with the resource information that matches.
Utilization of the present invention has comprised a plurality of information of each basic resources to be inquired about file, when therefore using a kind of title to inquire about for the user, also utilizes other resource information to inquire about simultaneously in internal system of the present invention, and recall ratio is improved.The present invention just handles trusted file, and size, the extension name type of file of coupling limited, thereby precision ratio is also corresponding is improved.
Embodiment
For science of the present invention and practicality are described, below search engine is carried out simple analysis.File that the present invention can provide from each file server and user's query demand two aspects are analyzed.Though file enormous amount, kind that the current file server provides are numerous, but its type is very concentrated, the present invention is to 839 ftp servers, 13 of sky online article spare search engine, 306,765 files are added up according to file extension, find the file of types such as file mainly concentrates on music, video display, can carry out, compression, the classification of visible mutation still clearly.And, carrying out statistics and analysis by query word to search engine, this class file is the highest file of user inquiring rate just also.After other websites are added up, consistent with The above results.This illustrates that fully though user's query word difference content huge, each file server also is not quite similar, both sides' main supply and demand is several classifications of concentrating relatively.Specifically, concentrate on exactly in software, singer, song, film, these five classifications of playing.This point is a key character of medium file search engine, and it is different from the Web search engine.The Web search is in order to obtain to comprise the specific webpage of appointed information; It then is in order to obtain the download address of certain logic resource that the user uses the purpose of medium file search engine.
The present invention at first needs to set up a basic resources storehouse, and it has comprised description information of files, generally can adopt the professional download site of method from Web of information extraction to obtain this information; The present invention is that example illustrates that the information state of this resources bank is as follows with " NetAnts ":
[dbase] NetAnts netants;
[software Chinese] NetAnts;
[software English name] netants;
[classification]/Software/ network tool;
[software size] 871424Bytes;
[exploitation] Http:// www.netants.com/;
[language form] simplified form of Chinese Character;
[platform] win9x/me/nt/2000/xp;
[introduce in detail] NetAnts are instruments that are used for file in download.The built-in Chinese edition of this version.The NetAnts characteristic is: it has further expanded the function of breakpoint transmission, can carry out multicast communication.New features: support plug-in resource packet, the speed of download of drag and drop basket shows.
For each resource is set up the query word matching characteristic, the Chinese and English title of resource normally finally forms the corresponding relation of the multi-to-multi between resource and the query word.Can be corresponding to a kind of resource except that different query words, the present invention also supports the situation of a query word corresponding to multiple resource.
Resources bank of the present invention is also classified to Miscellaneous Documents.Classify by files such as above-mentioned software, song, film, recreation, with convenient coupling.
When submit queries, user's query word is mapped to certain concrete mated resource information, carry out the file coupling with this resource information again.Carrying out file with resource information when mating, mate with the form of character string, promptly mate respectively with each descriptor.This can mate resource matched document entity back output that the match is successful, and to the descriptor of this resource.Here, document entity is meant a file that can unique location or the catalogue that is made of a plurality of files.Its include file name or directory name, time, size, path.Having the FTP address of determining URL such as one is exactly a document entity.The purpose of user's query requests obtains one or more document entities exactly in medium file search engine.
If successfully do not mate any resource of mating, the same support of the present invention carried out the routine inquiry according to simple string matching.

Claims (6)

1, the search method based on semanteme in a kind of medium file search engine may further comprise the steps,
1) sets up resource information bank, set up the matching relationship of this resource information bank and file, user input query speech simultaneously;
2) behind the user input query speech, at first go coupling, if the match is successful, then utilize the resource information in this resource information bank and the matching relationship of file to remove to mate corresponding document, and return Search Results to resource information bank; If it fails to match, then directly utilize this query word search file, and return Search Results.
2, the search method based on semanteme in the medium file search engine as claimed in claim 1 is characterized in that, the resource information in the described resource information bank comprises resource class and resource introduction.
3, the search method based on semanteme in the medium file search engine as claimed in claim 1 is characterized in that, the resource in the described resource information bank and the matching relationship of file comprise file name, size, the file type of resource respective file.
4, the search method based on semanteme in the medium file search engine as claimed in claim 1 is characterized in that, the matching relationship of resource in the described resource information bank and user inquiring speech comprises the query word that the user may import when this resource is inquired about.
5, the search method based on semanteme in the medium file search engine as claimed in claim 2 is characterized in that, described resource information bank is classified to resource information according to resource class.
6, the search method based on semanteme in the medium file search engine as claimed in claim 3 is characterized in that, described file available is specially the believable file of fileinfo.
CN 200410009691 2004-10-21 2004-10-21 Search method based on semantics in search engine Pending CN1763739A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200410009691 CN1763739A (en) 2004-10-21 2004-10-21 Search method based on semantics in search engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200410009691 CN1763739A (en) 2004-10-21 2004-10-21 Search method based on semantics in search engine

Publications (1)

Publication Number Publication Date
CN1763739A true CN1763739A (en) 2006-04-26

Family

ID=36747877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200410009691 Pending CN1763739A (en) 2004-10-21 2004-10-21 Search method based on semantics in search engine

Country Status (1)

Country Link
CN (1) CN1763739A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100442292C (en) * 2007-03-22 2008-12-10 华中科技大学 Method for indexing and acquiring semantic net information
CN101251841B (en) * 2007-05-17 2011-06-29 华东师范大学 Method for establishing and searching feature matrix of Web document based on semantics
CN102253982A (en) * 2011-06-24 2011-11-23 北京理工大学 Query suggestion method based on query semantics and click-through data
CN102799648A (en) * 2012-06-28 2012-11-28 用友软件股份有限公司 Retrieval device and method
CN103092945A (en) * 2013-01-11 2013-05-08 北京百度网讯科技有限公司 Searching method and device based on interface returning
CN103559313A (en) * 2013-11-20 2014-02-05 北京奇虎科技有限公司 Searching method and device
CN107943822A (en) * 2017-10-13 2018-04-20 南京邮电大学 OGC geographic information services semantic retrieving methods based on MIML

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100442292C (en) * 2007-03-22 2008-12-10 华中科技大学 Method for indexing and acquiring semantic net information
CN101251841B (en) * 2007-05-17 2011-06-29 华东师范大学 Method for establishing and searching feature matrix of Web document based on semantics
CN102253982A (en) * 2011-06-24 2011-11-23 北京理工大学 Query suggestion method based on query semantics and click-through data
CN102799648A (en) * 2012-06-28 2012-11-28 用友软件股份有限公司 Retrieval device and method
CN103092945A (en) * 2013-01-11 2013-05-08 北京百度网讯科技有限公司 Searching method and device based on interface returning
CN103092945B (en) * 2013-01-11 2019-11-26 北京百度网讯科技有限公司 A kind of searching method and device returned based on interface
CN103559313A (en) * 2013-11-20 2014-02-05 北京奇虎科技有限公司 Searching method and device
CN103559313B (en) * 2013-11-20 2018-02-23 北京奇虎科技有限公司 Searching method and device
CN107943822A (en) * 2017-10-13 2018-04-20 南京邮电大学 OGC geographic information services semantic retrieving methods based on MIML

Similar Documents

Publication Publication Date Title
US10210184B2 (en) Methods and systems for enhancing metadata
CN102063476B (en) Video searching method and system
US8938459B2 (en) System and method for distributed index searching of electronic content
US8768912B2 (en) System and method for geographically organizing and classifying businesses on the world-wide web
CA2511098A1 (en) Dispersing search engine results by using page category information
US20200175081A1 (en) Server, method and system for providing information search service by using sheaf of pages
CN1728134A (en) Multi-language network information search method and system based on supertext
CN1360267A (en) Sorting and searching method for files
CN103617174A (en) Distributed searching method based on cloud computing
CN102622402B (en) Server, method and system for providing information search service by using sheaf of pages
CN109542930A (en) A kind of data efficient search method based on ElasticSearch
CN1763739A (en) Search method based on semantics in search engine
Li et al. Research on web mining-based intelligent search engine
Ozmutlu et al. Trends in multimedia web searching: excite queries
KR20020087602A (en) System and method for searching music information through the internet
Zhang Efficient indexing and query processing in distributed search engines

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication